Department of Informatics – DDIS



This pages lists the MSc. Theses topics currently available in our group. Don't hesitate to contact the respective person if you are interested in one of the topics. If you would like to write a thesis about your own idea you can propose it to the person most related to what you plan to do or you can contact Prof. Bernstein directly.

Text mining on large dataset Data Analytics

How much do you (dis-)like statistics? Luckily, people assemble on platforms like StackOverflow data science to assist each other in the process of analysing data the right way. But, wouldn’t it be more convenient to have an automated system helping statistical analyses?

The goal of this thesis would be to combine techniques from NLP, Crowd Computing, text mining, and — obviously — statistics to devise such a solution that exploits Stack Overflow discussions.

Note that this project would be far easier if such a system already existed.. So there’s no time to lose :)

Posted: 12.09.2014

Contact: Patrick de Boer

Crowdsourced feature selection

Often, questions like the ones below can be answered by designing efficient decision models based on classification. Who are the potential customers we want to approach with a given solution? How can fraud and/or abuse be detected in a given system? To what degree can patients be treated/diagnosed successfully given observable symptoms? These models are elicited and studied in Machine Learning research and have lead to numerous tangible results so far. However, not all areas that are capturing human knowledge enjoy the benefits of clear and clean information representation. Consequently, the design of such models is a challenging task.

In this project you will explore decision scenarios where the classification will be done based on the non-obvious, latent knowledge coming from multiple human agents in the confines of a crowdsourcing framework. In this scenario we seek creative ways to elicit and aggregate this hidden (human) knowledge such that we can intertwine it with existing machine learning (classification) methods.

Posted: 27.06.2014

Contact: Michael Feldman

Information Extraction from On-line Fora

Alzheimer’s is a fatal disease of the brain, the sixth-leading cause of death in the United States and the only cause of death among the top 10 that cannot be prevented, cured or even slowed [1]. While Alzheimer has attracted significant investments from major pharmaceutical firms, it has also been a meager disease area for innovation, with most promising new drug candidates failing phase III of clinical trials [2].

Given the state of pharmaceutical treatment as urgency of the disease, doctors and caregivers have started exploring and adopting alternative non-pharmaceutical treatments to deal with the symptoms of the disease. Patients and their caregivers extensively exchange knowledge about such treatments in online forums (e.g.,

Contact: Abraham Bernstein