Department of Informatics – DDIS



This pages lists the MSc. Theses topics currently available in our group. Don't hesitate to contact the respective person if you are interested in one of the topics. If you would like to write a thesis about your own idea you can propose it to the person most related to what you plan to do or you can contact Prof. Bernstein directly.

Validate the use of statistics in research papers

Have you ever wondered if the medicaments you (or people you know) take to battle the flu actually work? Who tells you that they work? The packaging of your medicine? Do you generally believe what companies write on their product's packaging?
In fact, many of the chemical compounds combined in drugs (“active ingredients”) are found in research labs and published in medical papers. Just as other scientists, people publishing their findings need to evaluate them - in a medical setting, this is often done in an experiment with a treatment group and a control group. The effects of the treatment group are then compared with the control group using some statistical method. If you have ever dealt with statistics, you probably know that there is ample ambiguity and it’s often unclear which method to use towards a certain goal - an obstacle not just encountered by Computer Scientists, but also by medical doctors writing papers about active ingredients.

Let us know if you’re interested in supporting researchers (not just in medicine) with their data analysis. As a first step, we analyse existing research in the medical field for their statistical validity. We have derived a crowd process that allows our system to “understand” medical papers and are now aiming to algorithmically assess their statistical validity. That’s where you come in: We are looking for someone with a thorough understanding of web development (preferably in Scala + Play, but this can quickly be learnt), to extend our existing framework that interacts with crowd workers to determine the statistical validity of research papers.
Your thesis topic will be chosen and framed according to your interests and the flexibility of this project. Besides some already defined software development, you have the freedom to create additional metrics using NLP and various Data Science methods - all depending on what it is that you’re interested in.

We are looking for a highly skilled coder, who is curious and self-motivated.

Posted: 22.02.2016

Contact: Patrick de Boer

Make Astrophysics data accessible through the Semantic Web

Are you just as fascinated by the night sky as we are? Astrophysics tries to explain the many phenomena occurring above us (actually.. all around us) and uses various techniques to do so, among them terrestrial telescopes and satellites that obtain spectra from different regions of the sky. Unfortunately, (or fortunately for us) Astrophysicists in most cases don’t get a formal coding training and therefore each telescope’s data release can only be accessed through different API’s, huge data files and/or incompatible data formats.

Semantic web aims to solve just that problem by interlinking and publishing different datasets. And you could be a part of it, by enabling Astrophysics to easily work with different data sources, and thereby lower the entry barrier to great discovery. We look for someone with excellent coding skills, experience in talking to web-based API’s (mostly through code) and curiosity to learn more about space exploration. You should be self-motivated and willing to work with stakeholders from Astrophysics as well as Computer Science in building a platform that allows easy data extraction. We already have (Scala) code connecting to various API’s that you may, or may not, choose to use for your task.

We are looking for a highly skilled coder, who is curious and self-motivated.

Posted: 24.02.2016

Contact: Patrick de Boer and Shen Gao

Massively collaborative data analysis

Crowdsourcing has raised interest in both the scientific and industrial community as a collaboration model enabling the general public to join forces in order to solve otherwise challenging problems by posting them in an open call manner. Whereas the effectiveness of crowdsourcing in solving tedious, aggregative tasks is widely acknowledged, the understanding of how to crowdsource highly complex and ill-defined tasks such as data analysis is not yet fully discerned.

This thesis will investigate massively collaborative data analysis scenarios that involve diverse people with varying (and possibly limited) relevant knowledge. The goal of this thesis is to develop and evaluate a novel approach to supporting collaborative data analysis, and contribute to a better understanding of data analysis as a collaborative and distributed process accessible to a wide range of people with a diverse set of skills.

Your work will involve analytical aspect of designing processes that will allow non-experts to collaborate with data scientists to solve most exciting questions by taking advantage of the available data. Part of your thesis will be to build a prototype to evaluate your ideas in the real world setting by recruiting freelancers and data scientists. If you are interested in extending your knowledge in the domain of data analysis and excited about the opportunities of democratizing the data - driven research you should consider applying for this thesis.

Posted: 24.10.2015

Contact: Michael Feldman

Information Extraction from On-line Fora

Alzheimer’s is a fatal disease of the brain, the sixth-leading cause of death in the United States and the only cause of death among the top 10 that cannot be prevented, cured or even slowed [1]. While Alzheimer has attracted significant investments from major pharmaceutical firms, it has also been a meager disease area for innovation, with most promising new drug candidates failing phase III of clinical trials [2].

Given the state of pharmaceutical treatment as urgency of the disease, doctors and caregivers have started exploring and adopting alternative non-pharmaceutical treatments to deal with the symptoms of the disease. Patients and their caregivers extensively exchange knowledge about such treatments in online forums (e.g.,

Contact: Abraham Bernstein