Department of Informatics – DDIS

 

Masterprojects

This pages lists currently available Masterprojects in the DDIS group. If you are a group of students and would like to work on one of the desribed projects don't hesitate to contact the respective person or Prof. Bernstein directly to to talk about you own ideas.

Distributed Systems meets Economics: A survey about domain specific bidding languages

Project for: 1 Student
Can be completed as: Master Project/Master Thesis
Posted on: Nov 18, 2014

Introduction

A bidding language is a formal language that allows bidders, which are participating in an auction, to express their value for a certain outcome of the auction. Depending on the task, such a bidding language can have a very domain specific form.

The aim of this project is to understand how potential bidders in our domain would like to express their value for a specific result inside an auction. For this, you will conduct an exploratory, semi-structured interview.

Project Description

Your task is to plan the interview, invite the interviewee and conduct the interview. To do this, you will need to familiarize yourself with our specific domain, auctions and bidding languages, as well as interviewing principles.

At the end of this project you will not only have a deeper technical knowledge of auctions and bidding languages, but you will also have the opportunity to improve your soft-skills which are needed to conduct an interview with potential customers.

You need to be interested in the field of Distributed Systems, Economics and Computation (especially auctions) and Requirements Engineering. Prior knowledge in these areas is of advantage.

Contact: Tobias Grubenmann

Linked Raster Data

Project for 3-5 Students

In the Geo-Sptial community, Linked Data is gaining increasing importance. Recent examples are Linked Geo-Data and, most popular, Geonames aka the Linked Data version of Open Street Map. Such initiatives are not only supported by academia but indeed backed by prominent partners from industry (e.g. ESRI, Oracle) and the public services (UK Ordnance Survey, Swiss Federal Office for the Environment, US National Geographic Survey). Most approaches, however, aim towards the processing of vector data where entities (points, lines, polygons) are defined ex-ante. In collaboration with the Department of Geography (UZH) we recently approached towards linking raster data such as fields containing information about slope, height, population density, etc. to the Linked Open Data cloud. In particular, it is features deduced from raster data that can potentially be enriched with semantics.

Your task will be to integrate operators from a Geographic Information System such as ArcGIS or GRASS into a SPARQL engine such as Jena or Sesame. Based on a faceted browser you will integrate raster-data into the results of faceted search and/or faceted browsing. In the context of a Bachelor's thesis you will also show how operators in SPARQL translate into operators in the GIS. If performed as a Master Project, the work will more focus on the integration and optiomization aspect of SPARQL to GIS and vice versa.

Contact: Thomas Scharrenbach

Database optimization meets software engineering (Object-RDF-Mapper)

Most of todays (web) applications rely on some kind of ORM (Object-relational-mappers or Object-RDF-mappers) libraries to talk to databases. The ORM layers do a minimal effort when constructing queries for the underlying database, pushing thus the "hard-work" to the database optimizer. This projects aim is to enrich the ORM library with a database aware optimizer intended to create an optimal "mix" of queries intended to maximize the applications performance (interaction-throughput). The project is implemented in python. We are looking for students that have good knowledge of functional programming languages (not necessarylly python)

Contact: Cosmin Basca

Performance Monitoring of Storm Topologies

Project for: 1-2 Students
Can be completed as: Master Project
Posted on: July 4, 2014

Introduction

To make use of the ever increasing data volumes, we rely on massive compute clusters consisting of thousands of machines. Being able to work with compute clusters and to process large amounts of data, has become skill that is much sought after in business and research.

Companies such as Google, Yahoo, and Facebook would not be able to make use of the user data they collect without highly parallelized and distributed computer systems. One disadvantage of bach-based systems such as Apache MapReduce is, that the time it takes to process data limits the speed with which one can react to real world events. This is why continuous distributed data processing has gained traction in recent years.

Storm is a distributed fault-tolerant real-time computation framework. It is distributed and allows its users to write applications that return results in real-time. One problem in distributed computing in to decide how to assign work to the various computing nodes in a compute cluster. This process is called workload scheduling.

Project Description

The goal of this project is to develop and evaluate a monitoring console for the Storm platform. The console should allow the developer to monitor cluster usage such as inter-worker communication bandwidth, worker network/cpu loads, queue sizes in real time as well as provide possibilities to review historical performance metrics data. More specifically the project entails the following tasks:

  1. Implementation (or evaluation and integration) of
    • a stats collection facitlity that also stores historic data
    • a graphical (web-) user interface for the programmer to access the console
    • suitable visualization widgets such as sankey-, graph- and/or line charts
    • the integration of this console into the standard Storm UI
  2. The creation of user- and developer- documentation
  3. Evaluation of the console in terms of:
    • ressource utilization
    • usability

Requirements

Language: The project can be completed in either English or German.

Necessary skills:

  • Programming skills in Java
  • Knowledge in web programming (css, JS, Html) and backend scripting
  • Experience in or interest in learning more about distributed systems (Storm/Hadoop/Zookeeper/Torque/Maui)

Contact: Lorenz Fischer

Workload Scheduling in Storm

Project for: 1-2 Student
Can be completed as: Bachelor/Master Thesis, Master Project
Posted on: July 4, 2014

Introduction

To make use of the ever increasing data volumes, we rely on massive compute clusters consisting of thousands of machines. Being able to work with compute clusters and to process large amounts of data, has become skill that is much sought after in business and research.

Companies such as Google, Yahoo, and Facebook would not be able to make use of the user data they collect without highly parallelized and distributed computer systems. One disadvantage of bach-based systems such as Apache MapReduce is, that the time it takes to process data limits the speed with which one can react to real world events. This is why continuous distributed data processing has gained traction in recent years.

Storm is a distributed fault-tolerant real-time computation framework. It is distributed and allows its users to write applications that return results in real-time. One problem in distributed computing in to decide how to assign work to the various computing nodes in a compute cluster. This process is called workload scheduling.

Project Description

The goal of this project is to develop and evaluate a workload scheduler for the Storm platform based on research results of the DDIS research group. More specifically the project entails the following tasks:

  1. Implementation of
    • a Storm scheduler that bases its scheduling strategy on these statistics using graph partitioning algorithms
    • suitable graph partitioning algorithms
    • suitable topologies for the evaluation
    • evaluation scripts to evaluate the system
  2. Evaluation of the running Storm system in terms of
    • network load
    • scalability
    • cost of moving state between machines
  3. Possible Extensions:
    • Changing the way in which message ids get created in the acking framework. This task would also require an evaluation of the system performance with (and without) this new mechanism.
    • Implementation (and evaluation) of multiple different partitioning algorithms
    • Augmenting the communication graph with node weights that reflect CPU load

Requirements

Language: The project can be completed in either English or German.

Necessary skills:

  • Good programming skills in Java
  • Good understanding of Linux and some Bash-fu ;-)
  • Experience in or interest in learning more about distributed systems (Storm/Hadoop/Zookeeper/Torque/Maui)
  • Knowledge of Python and/or Clojure is a plus

Contact: Lorenz Fischer

Stream-Reasoning with Facts and Events

The amount of data available on the Web (e.g., on the Semantic Web and Linked Open Data Cloud (LOD)) is growing at an astounding speed. An increasing num- ber of these data-sources are dynamic (i.e. their content changes over time) or even represent continually updating phenomena (such as the stock exchange, sen- sor networks, social networks, or the continuous arrival of intelligence data).

ESPER and it's Event Processing Language (EPL) provide a highly scalable, memory-efficient, in-memory computing, SQL-standard, minimal latency, real-time streaming Big Data processing engine for medium to high-velocity and high-variety data. Esper is open-source software available under the GNU General Public License (GPL).

The ESPER-EPL is very powerful, but in several respects also rather lower-level. At DDIS we developed a higher-level language for Temporal Event and Fact processing based on the Semantic WEB query language SPARQL called TEF-SPARQL.

Task of this project (or thesis) is the implementation of a compiler from TEF-SPARQL to the ESPER-EPL in Java based on an existing SPARQL grammar for the Java parser-formalism ANTLR3.

Tasks:

  1. Extend the SPARQL-grammar to TEF-SPARQL
  2. Build an internal Algebra Graph during parsing using ANTLR3 Grammar Actions
  3. Visualisation of the Algebra Graph using Graphviz Graphviz
  4. Generate ESPER-EPL code out of the Algebra Graph
    • pure (single machine) ESPER
    • ESPER embedded into a stream environment for disturbed processing
  5. Performance Evaluations for different compilation methods

A simplified prove of concept implementation of 1-3 exist and can be used as an illustrative base.

References:

  • Gianpaolo Cugola and Alessandro Margara: Processing Flows of Information: From Data Stream to Complex Event Processing, ACM Computing Surveys, Volume 44 Issue 3, June 2012.
  • Kietz, etal.: TEF-SPQRL: The DDIS query-language for time annotated event and fact Triple-Streams (internal report)
  • ANTLR Grammar v3
  • The SPARQL 1.1 Query Language
  • ESPER Reference Manual
  • Christian Bockermann and Hendrik Blom: The streams Framework, Tech. Report, TU Dortmund, 2012 (PDF)

Contact: Jörg-Uwe Kietz