This pages lists the MSc. Theses topics currently available in our group. Don't hesitate to contact the respective person if you are interested in one of the topics. If you would like to write a thesis about your own idea you can propose it to the person most related to what you plan to do or you can contact Prof. Bernstein directly.
Massively collaborative data analysis
Crowdsourcing has raised interest in both the scientific and industrial community as a collaboration model enabling the general public to join forces in order to solve otherwise challenging problems by posting them in an open call manner. Whereas the effectiveness of crowdsourcing in solving tedious, aggregative tasks is widely acknowledged, the understanding of how to crowdsource highly complex and ill-defined tasks such as data analysis is not yet fully discerned.
This thesis will investigate massively collaborative data analysis scenarios that involve diverse people with varying (and possibly limited) relevant knowledge. The goal of this thesis is to develop and evaluate a novel approach to supporting collaborative data analysis, and contribute to a better understanding of data analysis as a collaborative and distributed process accessible to a wide range of people with a diverse set of skills.
Your work will involve analytical aspect of designing processes that will allow non-experts to collaborate with data scientists to solve most exciting questions by taking advantage of the available data. Part of your thesis will be to build a prototype to evaluate your ideas in the real world setting by recruiting freelancers and data scientists. If you are interested in extending your knowledge in the domain of data analysis and excited about the opportunities of democratizing the data - driven research you should consider applying for this thesis.
Contact: Michael Feldman
Collaborative Feature Engineering for Data Science projects
With the increasing availability of (big) data, industry's need for people who are capable of analysing data is on a rise. Indeed, the amount of people with Data Scientist on their LinkedIn profile has doubled in the last four years. Predictive modelling is one of the regular tasks a Data Scientist will get him/herself into: Building a classifier on features of training data to predict a certain attribute on unseed data.
But, how are features engineered on the the training set? A simple approach could be to declare all attributes (columns) of the data as "features", and let the classifier do the rest. On Kaggle, a popular data science platform that hosts regular competitions for predictive modelling, one doesn't get very far with this simple approach. Winning teams often explain, that a critical step to winning a competition is Feature Engineering: Refining/merging/splitting existing features and adding new ones to the data set. Basically, the team winning of the price on a Kaggle competition (between $10'000-$100'000), is the team who did the best job in Feature Engineering (and ensembling, which is out of our scope). Feature Engineering is also important in industry: The higher a classifier's accuracy, the more useful it is.
Crowd sourcing may be a way to build better features together as opposed to alone. Your Master thesis will be, to build a web-platform where data scientists can collaborate on feature engineering. We will then compare your platform to the current industry-standard of Feature Engineering (iPython Notebook). If your platform yields better results than the industry-standard, you will be responsible for a significant cost reduction in data science projects combined with a lower entrance barrier for companies to analyse their data. This may pave the way for wider adoption of data science in Switzerland and the economic advantages coming with it. :)
Through the course of your thesis, you will learn about Feature Engineering, build a platform using your desired technology and play a lot with data. Interested? Drop me a message
Contact: Patrick de Boer
Information Extraction from On-line Fora
Alzheimer’s is a fatal disease of the brain, the sixth-leading cause of death in the United States and the only cause of death among the top 10 that cannot be prevented, cured or even slowed . While Alzheimer has attracted significant investments from major pharmaceutical firms, it has also been a meager disease area for innovation, with most promising new drug candidates failing phase III of clinical trials .
Given the state of pharmaceutical treatment as urgency of the disease, doctors and caregivers have started exploring and adopting alternative non-pharmaceutical treatments to deal with the symptoms of the disease. Patients and their caregivers extensively exchange knowledge about such treatments in online forums (e.g. www.alzconnected.org, http://forum.alzheimers.org.uk/).
Contact: Abraham Bernstein