Navigation auf


Department of Informatics s.e.a.l

Research Project

The research project is the primary artifact of the course; the outcome of all projects will be a research paper (5 to 10 pages). Depending on the class size, the projects may be completed in groups of up to two. The intent of the project is to identify one or more research questions, investigate them and report the results in a scientific manner.

Project Deliverables

Project Proposal Draft A one-page proposal for your project to me. The course assistant and I will provide you feedback in a one-on-one meeting. The proposal has to include the problem/motivation for the project, the research question(s) and how you aim to address the research questions(s). This draft is important to be able to discuss the project ideas and help you express your ideas and us to understand what you are trying to do. However, the content and ideas are a lot more important than the formatting.

Final Project Proposal A one- to two-page proposal for your project. Basically, this will be a revised version of your original draft, including the comments from the meeting.

Proposal Presentation. A short (5 to 10 mins) presentation of the proposed project to the class. The goal is to share and discuss your ideas with multiple people.

Intermediate Drafts of the Report. To provide feedback on your results and write up early on, there will be intermediate steps for writing the final project report.

Written Report. The report includes your research project, the motivation, the research question(s), and the related work. The required length of the written report varies from project to project; all reports must be formatted according to the ACM format and submitted as a PDF in ACM paper format (two-column style).

Project Presentation. A presentation of the completed research project to the class.

Project Objective - An Empirical Analysis of Developers

  • Identify a real problem developers face and/or investigate a specific aspect of software development within the context of the provided data sets.
  • Read related work and determine what has already been done and how your project is different (could also be that you are replicating previous work).
  • Identify a relevant and interesting research question in scope of the provided data sets.
  • Determine how to address the research question given the available data set, i.e. how to analyze the data.
  • Run the analysis.
  • Write up the results in a scientific way, including the motivation, related work, analysis, results and more.

Possible Project Topics

Below is a list of possible project topics, basically data sets and some questions. This is a good starting point for a project and you can choose one of the provided data sets. You can also come up with your own idea and we are happy to guide you.

In addition, there is the opportunity to develop tool support for developers, e.g. something for the IDE or independent of the IDE that is based on developers' work patterns or biometric data and that can support developers in their work. We have sensors and monitoring tools that you can also borrow for this and possibly extend (the tools).

Biometric Sensor Projects

Two types of biometric sensors (see below) are available to use in a hands-on project. You would develop a small tool using data of one of the sensors, e.g. a visualization of an interesting aspect of the sensor data or an approach to analyze an aspect of development using the biometric data. You will then evaluate your tool in a small user study.

Muse EEG Sensor

Muse EEG sensor (Copyright by Interaxon)

The Muse EEG sensor captures the electrical activity of the brain, which can indicate certain mental states as high cognitive load or relaxation. More information can be found here.

Polar H7 Chest Band

Polar H7 Heart Rate Sensor
(Copyright by Polar)

The Polar H7 is a chest band that measures heart related data, as the heart rate or heart rate variability. This information could for instance be used to detect the stress level of a software developer. More information can be found here.

Developer Activity, Productivity and Emotion Data

This data set was captured from a range of up to 20 developers over a one to two week period. We installed a small monitor on their machines, tracked their computer interactions, and at the same time frequently asked them to asses various aspects, such as their productivity or emotions over the past hour and to describe the tasks they worked on. The data logged by the installed monitor contains events for each switch between different applications and it also contains information on the amount of keyboard and mouse usage. There is a lot of questions one could look at with this data and if you want more information on the data set, please just talk to us. A few of the possible research questions are:

  • What happens at times when developers switch applications a lot (or not at all)?
  • Are developers more productive when they are happy and are their patterns of productivity/happiness?
  • Is it possible to use keyboard/mouse/activity data to determine flow/progress?
  • When are developers most productive and why?
  • Do developers follow certain kinds of activity patterns (e.g. usually look at emails after a meeting, always use browser with coding, always look emails with planning, etc.) or how do they structure their workday?
  • Does single- or multi-tasking have an influence on the perceived productivity? (the tricky part here will be to come up with a good definition of single-/multi-tasking based on the collected data)

Biometric Data

For a range of developers, we collected biometric data for a two week period and at the same time frequently asked them to assess various aspects, such as their productivity or emotions over the past hour. In several cases we also collected computer interaction data which allows to ask even more questions on the data set and see which kind of collected data is best to predict certain aspects of a developer. Possible research questions are:

  • What kind of emotions do developers experience while programming and can we use biometric data to automatically determine emotions?
  • Can we use biometric data to determine a developer's productivity?
  • Which kind of data is best to predict productivity or emotions, i.e. do we need to use biometric sensors or could we, for example, just use keyboard input?

Software Repository Histories and Metrics

This large-scale dataset contains code metrics which have been computed for ~1.3 Million revisions of 300 Java, C# and JavaScript projects. As such, it describes the evolutionary history of these projects at a very fine-grained level.

For each project, the dataset contains two main pieces of information: the commit history (dates, names and other metadata for each commit) and the actual code metrics for every revision of each project. Detailed metrics for each program entity (files, classes, methods, etc.) are stored in a sparse data structure spanning the entire history of the project. The data has also been aggregated across all entities for each revision (e.g. total complexity of the project). The following metrics are available: cyclomatic complexity, control flow nesting level, number of unique control flow paths, class-, method-, method parameter-, statement-, variable-, and AST vertex counts.

This dataset is interesting because it allows a very fine-grained analysis of the evolution of a large sample of projects. Furthermore, enables a comparison between projects written in different programming languages. Various classification and clustering analyses can be performed on this data. The following questions may serve as a starting point when analyzing the data:

  • Is it possible to identify evolutionary patterns? (I.e. could time
  • series clustering analyses reveal "typical" kinds of project evolutions?)
  • Do projects written in different languages undergo different
  • evolutionary patterns? (E.g. do metrics fluctuate differently?)
  • Do different committers significantly influence metrics? (I.e. do
  • some committers generally increase complexity, while others decrease it?)
  • Which events prompt the reduction of a metric?

Eye-Tracking Data

For a set of 12 professional developers and 10 student developers that each worked on 3 change tasks, we collected data on their eye gaze--where they looked--as well as which code they selected or edited within the IDE. A paper that presents one kind of analysis on the data is presented in Tracing Software Developers' Eyes and Interactions for Change Tasks and there is several further research questions that would be interesting to investigate. Here are a few of them:

  • Do developers follow control flow patterns within a method during code exploration?
  • How do developer navigate between methods?
  • Do developer look at the same lines within a method?
  • Can different patterns of navigation behavior of developers be identified?
  • Does the programming editor highlighting influence eye gaze navigation?

Interaction Data

The data set includes interactions from developers on 58 change tasks of the project Mylyn.Context. The data set includes the type of the interaction (whether it was an edit or a selection), the start and end time, the identifier of the element and the origin of the interaction. A similar data set was used in Developers' Code Context Models for Change Tasks. Possible research questions are:

  • Are there any patterns or heuristics for determining a developer's context?
  • How are the elements within a context connected?
  • How could the context be summarized to better resume the task?