Department of Informatics – DDIS

 

Masterprojects

This pages lists currently available Masterprojects in the DDIS group. If you are a group of students and would like to work on one of the desribed projects don't hesitate to contact the respective person or Prof. Bernstein directly to to talk about you own ideas.

Linked Raster Data

Project for 3-5 Students

In the Geo-Sptial community, Linked Data is gaining increasing importance. Recent examples are Linked Geo-Data and, most popular, Geonames aka the Linked Data version of Open Street Map. Such initiatives are not only supported by academia but indeed backed by prominent partners from industry (e.g. ESRI, Oracle) and the public services (UK Ordnance Survey, Swiss Federal Office for the Environment, US National Geographic Survey). Most approaches, however, aim towards the processing of vector data where entities (points, lines, polygons) are defined ex-ante. In collaboration with the Department of Geography (UZH) we recently approached towards linking raster data such as fields containing information about slope, height, population density, etc. to the Linked Open Data cloud. In particular, it is features deduced from raster data that can potentially be enriched with semantics.

Your task will be to integrate operators from a Geographic Information System such as ArcGIS or GRASS into a SPARQL engine such as Jena or Sesame. Based on a faceted browser you will integrate raster-data into the results of faceted search and/or faceted browsing. In the context of a Bachelor's thesis you will also show how operators in SPARQL translate into operators in the GIS. If performed as a Master Project, the work will more focus on the integration and optiomization aspect of SPARQL to GIS and vice versa.

Contact: Thomas Scharrenbach

Database optimization meets software engineering (Object-RDF-Mapper)

Most of todays (web) applications rely on some kind of ORM (Object-relational-mappers or Object-RDF-mappers) libraries to talk to databases. The ORM layers do a minimal effort when constructing queries for the underlying database, pushing thus the "hard-work" to the database optimizer. This projects aim is to enrich the ORM library with a database aware optimizer intended to create an optimal "mix" of queries intended to maximize the applications performance (interaction-throughput). The project is implemented in python. We are looking for students that have good knowledge of functional programming languages (not necessarylly python)

Contact: Cosmin Basca

Workload Scheduling in Storm

Project for 2 Students

Introduction

To make use of the ever increasing data volumes, we rely on massive compute clusters consisting of thousands of machines. Being able to work with compute clusters and to process large amounts of data, has become skill that is much sought after in business and research.

Companies such as Google, Yahoo, and Facebook would not be able to make use of the user data they collect without highly parallelized and distributed computer systems. One disadvantage of bach-based systems such as Apache MapReduce is, that the time it takes to process data limits the speed with which one can react to real world events. This is why continuous distributed data processing has gained traction in recent years.

Storm is a distributed fault-tolerant real-time computation framework. It is distributed and allows its users to write applications that return results in real-time. One problem in distributed computing in to decide how to assign work to the various computing nodes in a compute cluster. This process is called workload scheduling.

Project Description

The goal of this project is to develop and evaluate a workload scheduler for the Storm platform based on research results of the DDIS research group. More specifically the project entails the following tasks:

  1. Implementation of
  2. Evaluation of the running Storm system in terms of
    • network load
    • scalability
    • cost of moving state between machines

Requirements

Language: The project can be completed in either English or German.

Necessary skills:

  • Good programming skills in Java
  • Good understanding of Linux and some Bash-fu ;-)
  • Experience in or interest in learning more about distributed systems (Storm/Hadoop/Zookeeper/Torque/Maui)
  • Knowledge of Python and/or Clojure is a plus

Contact: Lorenz Fischer

Stream-Reasoning with Facts and Events

The amount of data available on the Web (e.g., on the Semantic Web and Linked Open Data Cloud (LOD)) is growing at an astounding speed. An increasing num- ber of these data-sources are dynamic (i.e. their content changes over time) or even represent continually updating phenomena (such as the stock exchange, sen- sor networks, social networks, or the continuous arrival of intelligence data).

ESPER and it's Event Processing Language (EPL) provide a highly scalable, memory-efficient, in-memory computing, SQL-standard, minimal latency, real-time streaming Big Data processing engine for medium to high-velocity and high-variety data. Esper is open-source software available under the GNU General Public License (GPL).

The ESPER-EPL is very powerful, but in several respects also rather lower-level. At DDIS we developed a higher-level language for Temporal Event and Fact processing based on the Semantic WEB query language SPARQL called TEF-SPARQL.

Task of this project (or thesis) is the implementation of a compiler from TEF-SPARQL to the ESPER-EPL in Java based on an existing SPARQL grammar for the Java parser-formalism ANTLR3.

Tasks:

  1. Extend the SPARQL-grammar to TEF-SPARQL
  2. Build an internal Algebra Graph during parsing using ANTLR3 Grammar Actions
  3. Visualisation of the Algebra Graph using Graphviz Graphviz
  4. Generate ESPER-EPL code out of the Algebra Graph
    • pure (single machine) ESPER
    • ESPER embedded into a stream environment for disturbed processing
  5. Performance Evaluations for different compilation methods

A simplified prove of concept implementation of 1-3 exist and can be used as an illustrative base.

References:

  • Gianpaolo Cugola and Alessandro Margara: Processing Flows of Information: From Data Stream to Complex Event Processing, ACM Computing Surveys, Volume 44 Issue 3, June 2012.
  • Kietz, etal.: TEF-SPQRL: The DDIS query-language for time annotated event and fact Triple-Streams (internal report)
  • ANTLR Grammar v3
  • The SPARQL 1.1 Query Language
  • ESPER Reference Manual
  • Christian Bockermann and Hendrik Blom: The streams Framework, Tech. Report, TU Dortmund, 2012 (PDF)

Contact: Jörg-Uwe Kietz