DBTG Student Projects

In the Database Technology Group, we offer students the possibility to work on any of the available types of theses (Vertiefung, Facharbeit, BSc thesis, MSc Basismodul, MSc project, MSc thesis). This page contains selected proposals based on the research interesets of each member of our group. The status of each project is indicated on the right and the workload of most can be adapted based on the kind of thesis you would like to conduct. In case you are not interested in those proposals but you would still like to work in the area of database technologies, feel free to contact one of the people in the group or Prof. M. Böhlen.

In the table below, we provide information on the steps needed for the completion and evaluation of each thesis. In addition, it is important to have in mind that the MSc projects are always performed in small groups and not individually.

Assignment Report Oral Exam Presentation
Vertiefung
Facharbeit
MSc Basismodul
BSc Thesis
MSc Project
MSc Thesis

Development of a dynamic web application [Software Project]            Open

The Swiss Feed Database contains historical data about 100 feed types and 900 nutrients. A web application is provided allowing users to define a set of selection criteria (e.g., nutrient and feed types, temporal and geographical information about feed samples), retrieve nutrient content data, and overview results in various intuitive forms. This project aims at
developing Swiss Feed Database v2.0 following the Model-view-controller design pattern that separates data management (model), interface (view), and business logic (controller) of the application. The goals of this re-implementation are:

  • to end up with a codebase that complies with the MVC design pattern principles.
  • to design a star schema for the database that satisfies referential integrities and controls redundancy
  • to optimize how much time is spent on server and client side, respectively, document the results, and implement a web application that asynchronously load data.
  • to design a user interface that follows a responsive web design approach and extend the functionality (e.g., new selection criteria).

detailed project description (PDF, 115 KB)

Supervisor: Georgios Garmpis

Top-K Ranking in Temporal Probabilistic Databases             Open

TemProRA is a system that focuses on the analysis of the Top-k results of a temporal probabilistic query, i.e. on the k result tuples with either the lowest or the highest probabilities. It includes the Temporal Probabilistic Lineage Tree (TPLT), the Temporal Probabilistic Bubble Chart (TPBC) and the Temporal Probabilistic Column Chart (TPCC): three tools created for each output tuple to provide the user with the most important information to systematically modify the corresponding time-varying probability. This project aims at improving the efficiency of these three tools by improving the probability computation process and the identification of the Top-k results

detailed project description

Supervisor: Katerina Papaioannou

Including Conditions over Non-Temporal Attributes to the Disjoint Interval Partitioning Open

This project is the continuation of the work entitled "Disjoint Interval Partitioning" (DIP) published on the Very Large DataBase Journal. You will enrich the semantic of the temporal operations computed over the DIP-Partitions, by building a correct and efficient method for evaluating also conditions over non-temporal attributes. 

Send an email to Dr. Francesco Cafagna for further info.

Supervisor: Dr. Francesco Cafagna

Scalable Parsimonious Temporal Aggregation             Open

A vast majority of the temporal data produced nowadays have a time-point timestamp e.g. "Ann shops salad on Thursday, August 4th" whereas, in many cases, a probability annotation is also added e.g "Ann will shop salad on Monday, October 10th with 0.2 probability". The volume of time-point timestamped data is reduced by aggregating the data into interval timestamped tuples e.g.  "Ann will shop salad from October 10th to October 15th with 0.25 probability". In this project we focus on implementing a scalable approach which could aggregate time-point timestamped tuples into interval timestamped ones based on the fact (Ann shops salad) they include and the corresponding probability.

detailed project description

Supervisor: Katerina Papaioannou

Integration of Generally Valid Predicates into PostgreSQL Open

Ongoing time points, such as now, are common in relations that include a valid-time attribute. The end point of the valid time is equal to now if a tuple is valid until the present time. Since the present time continuously changes, the valid time of a tuple does so as well. As a consequence, evaluating queries on relations with ongoing time points leads to results that change when time passes by. A key challenge is the evaluation of the queries’ conditions to results that remain valid because the conditions might evaluate to true at some times and to false at others.

In this project, we focus on implementing generally valid logical connectives and predicates for time points and time intervals into the kernel of the widely-used open-source database system PostgreSQL. Then, the results of the predicates and connectives remain valid as time passes by and the connectives and predicates can be used to formulate arbitrarily complex conditions.

Supervisor: Yvonne Mülle

Cost Analysis for Generally Valid Conditions Open

Ongoing time points, such as now, change as time passes by and thus, instantiate to different time points dependent on the present time. For instance, the ongoing time point now instantiates to a time point equal to the present time: it instantiates to time point 2016 at present time 2016, to time point 2017 at present time 2017, and so forth. To evaluate queries to results that remain valid as time advances, we proposed general validity that evaluates a query at all possible present times and thus, independent of a specific time. A condition used in the operators of a query is composed of predicates that are logically connected with conjunction, disjunction, or negation. Compared to a condition that can be evaluated only at a single present time, generally valid conditions are more expensive in their evaluation.

This project aims at analytically determining the evaluation cost of generally valid predicates and logical connectives. These cost functions enable estimating for complex conditions that are composed of generally valid predicates and connectives when the cost of re-evaluating a query exceeds the cost of evaluating a generally valid query and thus, turns the evaluation overhead of a generally valid query into a performance gain.

Supervisor: Yvonne Mülle

Index Structures for Ongoing Time Points Open

Ongoing time points, such as now, instantiate to different time points when time passes by and are therefore dependent on the present time. They are common in relations that include a valid-time attribute to denote that a tuple is valid until the present time. Indexes are used in database systems to accelerate accessing tuples in a database relation without the need of scanning the entire relation. Building an index on ongoing time intervals (which can contain ongoing time points as start or end point) is a difficult and challenging task as the data to be indexed changes when time passes by.

In this project, the student reviews existing approaches for indexing attributes with ongoing time points and assesses the approaches’ performance and practical feasibility.

Supervisor: Yvonne Mülle

Realizing The Concept of NOW In Now-Relative Databases Taken

Temporal databases allow keeping track of time-varying data. In valid-time databases, the information can be stored when each tuple is valid in the real-world. The time range can either be in the past, present, or future. The concept of NOW allows modeling that a tuple is currently valid, i.e., its end point is unknown. As NOW is a concept rather than a concrete value, the question arises how NOW can be integrated into a database management system and how queries can and should be answered in the presence of NOW.

Supervisor: Yvonne Mülle

Integrity Constraints In Now-Relative Databases Taken

Temporal databases consist of one or two of the following time dimensions: valid time and transaction time. Valid time describes when a tuple is valid in the real-world; transaction time describes when a tuple is valid within the database. The concept of NOW represents that a tuple is currently valid, i.e., that its end point is unknown. As a consequence, the instantiation of NOW changes when time is evolving.
Integrity constraints like primary keys and foreign keys are fundamental concepts in a DBMS. Adding a temporal component to a database already changes the way how these constraints are ensured. However, these extended check mechanisms are not sufficient anymore in the presence of tuples containing NOW. The reason is that data changes not only through modification statements (insert, update, and delete), but also because time passes by. Thus, new mechanisms are necessary in order to be able to still ensure integrity constraints in now-relative databases.

Supervisor: Yvonne Mülle

QR decomposition integration into column-store DBSM Open

The column-store approach in DBMSs has become popular and a number of hybrid or pure column-store systems, such as MonetDB or Apache Cassandra, is available. Column-oriented systems have a lot of advantages compared to row-stores and offer a great flexibility for analyt- ical workloads. Among others, column-stores allow to implement linear operations better and more efficiently. For instance, the computation of the Singular Value Decomposition is based on matrix Q from QR decomposition computed with the Gram-Schmidt algorithm. However, the QR decomposition, as any linear operation, is defined over matrices but not relations. A relation should be adjusted in order to be suitable for QR decomposition. Current approach of the adjustment consists of a relation partitioning and ordering. The goal of this project is to integrate QR decomposition operation into MonetDB system using the suggested approach.

Supervisor: Oksana Dolmatova

QR decomposition integration into DBSM Taken

Modern databases provide limited statistical functionality which is mostly based on array/matrix operations. Now only certain database systems such as SciQL and SciDB are able to work with arrays and matrices and provide access to analytical functions inside DBMS. While using other DBMS one should copy all data to R and perform analyse there. In this project we want to consider one of the issue which databases have working with matrices: fast access to rows and columns at the same time. The question is to understand how this problem can be solved on the QR decomposition integration example and how this algorithm can be integrated into PostgreSQL efficiently.
The project includes comparison between the following approaches:

  • adapt algorithm for the fast tuple access which implemented in Postgres.
  • extend current table structure or use indexes (and find trade off between memory/time consumption) to have fast access through columns.

All extensions should be done in PostgreSQL and mostly in C language:

  • parser: extend parser with the new syntax.
  • optimiser: extend optimiser with the new operation.
  • qr decomposition functionality: Gramm-Schmidt algorithm modification.
  • index structure definition.

Supervisor: Oksana Dolmatova

Implementing a disk-resident spatial index structure (Quadtree) Taken

This project gives you an in depth understanding of file management and index structures, in the context of spatially extended DBMS. Spatial DBMS are database systems that are optimized to store and query data that represents objects defined in a geometric space (e.g., a point in a Cartesian coordinate system, geometry of Zurich on a world map). For spatial DBMS specific kind of indexes (called spatial indexes) are used that can handle two (or more) dimensional data. The most commonly used spatial indexes are: Quadtree, R-Tree, Spatial hash index. The goal of this project is to implement a disk-resident spatial index, namely a Quadtree.

detailed project description (PDF, 47 KB)

Supervisor: Georgios Garmpis

Interlinking the Swiss Feed Database with Linked Open Data Cloud Taken

Goal of this project is to get familiar with Semantic Web and Linked Data technologies and relevant tools, and use them in order to link the Swiss Feed Database with the Linked Open Data Cloud. For this purpose an OWL ontology should be constructed that describes properties of the Swiss Feed Database. Then, data of the Swiss Feed Database should be transformed into RDF (according to the OWL ontology) and be interlinked with a dataset of the Linked Open Data Cloud (e.g., GeoNames). Finally, the utility of these new RDF datasets will be demonstrated using SPARQL queries and their results.

detailed project description (PDF, 80 KB)

Supervisor: Georgios Garmpis

Implementing Conflict-Free Replicated Data Types (CRDTs) [BSc Vertiefungsarbeit] Taken

The massive increase in data in recent years put a lot of pressure on database vendors to make their systems highly distributed and scalable to very large amounts of data. Usually, distributed databases replicate data over many nodes in the system to improve access time and protect against accidental data loss. However, keeping the replicas consistent (i.e. having the same value on each replica all the time) across the entire system is a difficult problem. Conflict-Free Replicated Data Types (CRDTs) offer a novel approach to solve this problem. They guarantee that data on different replicas converge to a common and predictable state. In this project the student is asked to implement the Observed Remove Set CRDT.

Supervisor: Kevin Wellenzohn