# DBTG Student Projects

In the Database Technology Group, we offer students the possibility to work on any of the available types of theses (BSc thesis, MSc Basismodul, MSc project, MSc thesis). This page contains selected proposals based on the research interesets of each member of our group. The status of each project is indicated on the right and the workload of most can be adapted based on the kind of thesis you would like to conduct. In case you are not interested in those proposals but you would still like to work in the area of database technologies, feel free to contact one of the people in the group or Prof. M. Böhlen.

In the table below, we provide information on the steps needed for the completion and evaluation of each thesis. In addition, it is important to have in mind that the MSc projects are always performed in small groups and not individually.

Assignment |
Report |
Oral Exam |
Presentation (20 minutes) |

MSc Basismodul |
✔ | ✔ | ✘ |

BSc Thesis |
✔ | ✘ | ✔ |

MSc Project |
✔ | ✘ | ✘ |

MSc Thesis |
✔ | ✘ | ✔ |

**November 3, 2021** Taken

**Optimization of Lempel-Ziv Algorithm for Entropy Rate Estimation of Numerical Time series:**

The Lempel-Ziv algorithm is a well known universal lossless compression algorithm used for example in the GIF image format. The naive implementation of this algorithm has a time complexity of O(n^{2}). If the series is symbolic implementations based on hash or dictionary data structure reduce the computational complexity to O(n). But this is not straightforward to apply when the series is numeric. So, the objective of the project is to implement solutions which are better than the naive solution for numeric series. Moreover, the project includes an experimentation task where the students are required to develop time series prediction models to study correlation between prediction performance and estimated entropy rate.

Supervisor: Jamal Mohammed

### Matrix Operations with Gathering in MonetDB Taken

Detailed project description (PDF, 85 KB)

Supervisor: Oksana Dolmatova

### Integration of matrix operations from a third-party library into MEMONTaken

MEMON is an extension of MonetDB with matrix operations defined over relations and performed over internal MonetDB data structure. There are exist many highly-tuned efficient libraries to compute matrix operations such as LAPACK/BLAS/Eigen/Armadillo. The goal of this project is to implement and evaluate an efficient extension of MEMON where execution of matrix operations is delegated to a third-party library. Evaluation step consists of comparison of implemented extension with MEMON and with state-of-the-art statistical packages such as Pandas and R.

Supervisor: Oksana Dolmatova

### Efficient implementation of Join over transposed relations Taken

One of the main issues that prevents seamless integration of matrix operations into a DBMS is inability of all main DBMS to handle relations with large number of attributes, which can be results of certain matrix operations. For instance, transpose operation often used in matrix expressions (e.g., outer and cross products) applied to a relation with 1 Million of tuples returns a relation with 1 Million of attributes. The focus of this project is to tackle the issue on the example of performing join operation over relation that was previously transposed: r^{t} ⋈ s. In order to avoid the materialization of relation with huge number of attributes, physical transposition is avoided, and join algorithm is adjusted.

The goal of the project is to implement and evaluate adjusted join algorithm over pseudotransposed relations (i.e., relations that supposed to be transposed, but stored in a non-transposed manner) in MonetDB.

Supervisor: Oksana Dolmatova

### QR decomposition integration into column-store DBSM Taken

The column-store approach in DBMSs has become popular and a number of hybrid or pure column-store systems, such as MonetDB or Apache Cassandra, is available. Column-oriented systems have a lot of advantages compared to row-stores and offer a great flexibility for analyt- ical workloads. Among others, column-stores allow to implement linear operations better and more efficiently. For instance, the computation of the Singular Value Decomposition is based on matrix Q from QR decomposition computed with the Gram-Schmidt algorithm. However, the QR decomposition, as any linear operation, is defined over matrices but not relations. A relation should be adjusted in order to be suitable for QR decomposition. Current approach of the adjustment consists of a relation partitioning and ordering. The goal of this project is to integrate QR decomposition operation into MonetDB system using the suggested approach.

Supervisor: Oksana Dolmatova

### Integration of Generally Valid Predicates into PostgreSQL Taken

Ongoing time points, such as *now*, are common in relations that include a valid-time attribute. The end point of the valid time is equal to *now* if a tuple is valid until the present time. Since the present time continuously changes, the valid time of a tuple does so as well. As a consequence, evaluating queries on relations with ongoing time points leads to results that change when time passes by. A key challenge is the evaluation of the queries’ conditions to results that remain valid because the conditions might evaluate to true at some times and to false at others.

In this project, we focus on implementing generally valid logical connectives and predicates for time points and time intervals into the kernel of the widely-used open-source database system PostgreSQL. Then, the results of the predicates and connectives remain valid as time passes by and the connectives and predicates can be used to formulate arbitrarily complex conditions.

Supervisor: Yvonne Mülle

### Realizing The Concept of *NOW* In Now-Relative Databases Taken

Temporal databases allow keeping track of time-varying data. In valid-time databases, the information can be stored when each tuple is valid in the real-world. The time range can either be in the past, present, or future. The concept of *NOW* allows modeling that a tuple is currently valid, i.e., its end point is unknown. As *NOW* is a concept rather than a concrete value, the question arises how *NOW* can be integrated into a database management system and how queries can and should be answered in the presence of *NOW*.

Supervisor: Yvonne Mülle

### Development of a dynamic web application [Software Project] Taken

The Swiss Feed Database contains historical data about 100 feed types and 900 nutrients. A web application is provided allowing users to define a set of selection criteria (e.g., nutrient and feed types, temporal and geographical information about feed samples), retrieve nutrient content data, and overview results in various intuitive forms. This project aims at

developing Swiss Feed Database v2.0 following the Model-view-controller design pattern that separates data management (model), interface (view), and business logic (controller) of the application. The goals of this re-implementation are:

- to end up with a codebase that complies with the MVC design pattern principles.
- to design a star schema for the database that satisfies referential integrities and controls redundancy
- to optimize how much time is spent on server and client side, respectively, document the results, and implement a web application that asynchronously load data.
- to design a user interface that follows a responsive web design approach and extend the functionality (e.g., new selection criteria).

detailed project description (PDF, 115 KB)

Supervisor: Georgios Garmpis

### Integrity Constraints In Now-Relative Databases Taken

Temporal databases consist of one or two of the following time dimensions: valid time and transaction time. Valid time describes when a tuple is valid in the real-world; transaction time describes when a tuple is valid within the database. The concept of *NOW* represents that a tuple is currently valid, i.e., that its end point is unknown. As a consequence, the instantiation of *NOW* changes when time is evolving.

Integrity constraints like primary keys and foreign keys are fundamental concepts in a DBMS. Adding a temporal component to a database already changes the way how these constraints are ensured. However, these extended check mechanisms are not sufficient anymore in the presence of tuples containing *NOW*. The reason is that data changes not only through modification statements (insert, update, and delete), but also because time passes by. Thus, new mechanisms are necessary in order to be able to still ensure integrity constraints in now-relative databases.

Supervisor: Yvonne Mülle

### Implementing Conflict-Free Replicated Data Types (CRDTs) [BSc Vertiefungsarbeit] Taken

The massive increase in data in recent years put a lot of pressure on database vendors to make their systems highly distributed and scalable to very large amounts of data. Usually, distributed databases replicate data over many nodes in the system to improve access time and protect against accidental data loss. However, keeping the replicas consistent (i.e. having the same value on each replica all the time) across the entire system is a difficult problem. Conflict-Free Replicated Data Types (CRDTs) offer a novel approach to solve this problem. They guarantee that data on different replicas converge to a common and predictable state. In this project the student is asked to implement the Observed Remove Set CRDT.

Supervisor: Kevin Wellenzohn

### Implementing a disk-resident spatial index structure (Quadtree) Taken

This project gives you an in depth understanding of file management and index structures, in the context of spatially extended DBMS. Spatial DBMS are database systems that are optimized to store and query data that represents objects defined in a geometric space (e.g., a point in a Cartesian coordinate system, geometry of Zurich on a world map). For spatial DBMS specific kind of indexes (called spatial indexes) are used that can handle two (or more) dimensional data. The most commonly used spatial indexes are: Quadtree, R-Tree, Spatial hash index. The goal of this project is to implement a disk-resident spatial index, namely a Quadtree.

detailed project description (PDF, 47 KB)

Supervisor: Georgios Garmpis

### Interlinking the Swiss Feed Database with Linked Open Data Cloud Taken

Goal of this project is to get familiar with Semantic Web and Linked Data technologies and relevant tools, and use them in order to link the Swiss Feed Database with the Linked Open Data Cloud. For this purpose an OWL ontology should be constructed that describes properties of the Swiss Feed Database. Then, data of the Swiss Feed Database should be transformed into RDF (according to the OWL ontology) and be interlinked with a dataset of the Linked Open Data Cloud (e.g., GeoNames). Finally, the utility of these new RDF datasets will be demonstrated using SPARQL queries and their results.

detailed project description (PDF, 80 KB)

Supervisor: Georgios Garmpis