Navigation auf uzh.ch

Suche

Department of Informatics s.e.a.l

ASDS'13 PhD Talks Schedule

PhD talks

The symposium will feature short talks by PhD students to introduce their research ideas, hypotheses, approaches, models, theories, ... and their prototypes.

The best talk will be awarded by a special prize sponsored by CSF.

Every ASDS'13 participant can give 3 votes to chit-chat talks of PhD students. Note that participants cannot vote for a student whom they have a conflict of interest with – i.e., former or current students. The presenters themselves are not allowed to vote. To vote follow this Doodle Poll Link. Thank you for your participation!

Monday 11.3 16:00 - 17:30

Jan Kurs: Programming Language Grammar Inference

Many programming languages (either general-purpose languages or domain-specific ones) are created every year. Knowledge of a language grammar is essential to support a language in IDEs and other tools. Unfortunately, the grammar of the language might not be known to you or the grammar might not exist at all. A development of a parser without knowledge of the grammar is time-consuming and challenging. Grammar inference in general is a very hard problem. Others have tried semi-automatic approaches using various sources such as documentation, source code, abstract syntax trees and existing parsers to infer a complete grammar of a language. In our approach we do not focus on a complete grammar, but only on the parts that are important for our purposes. This might simplify the grammar inference problem significantly. We would like to develop methods and tools for partial grammar inference which helps developers to infer a grammar and a parser suitable for their purposes in a very short time so that developers have more time for useful work.

Martin Brandtner: Software Quality Assessment

Software quality assessment shall monitor and guide the evolution of a system based on quality measurements. This continuous process should ideally involve multiple stakeholders and for each of them provide adequate measurements to use and interpret. We want to support an effective selection of quality measurements based on the type of software and individual information needs of the involved stakeholders. We propose an approach that brings together quality measurements and individual information needs for a context-sensitive tailoring of information related to a software quality assessment. We address the following research question: How can we better support different stakeholders in the quality assessment of a software system? For that we will devise theories, models, and prototypes to capture their individual information needs, tailor information from software repositories to these needs, and enable a contextual analysis of the quality aspects. Such a context-sensitive tailoring will provide a precise and individual view on the measurements based on latest development trends in the project.

Roberto Minelli: Mobile applications analysis: the first steps

Smartphones and tablets, require specific software, known as mobile applications, or apps. Less than a decade old, the business of apps has turned into a multi-billion dollar market. From a software engineering perspective apps represent a new phenomenon, and they have not been thoroughly investigated until now. We developed SAMOA, a visual web-based software analytics platform for apps. It mines software repositories of apps and uses a set of visualization techniques to present the mined data. In this talk we introduce our tool, detail the analyses it supports, and present some of the insights and lessons learned.

Luca Ponzanelli: Exploiting Crowd Knowledge in the IDE

Software development process is a non trivial endeavor. Documentation does not often hold the pace of change of the project, thus lowering the support that developers need to understand the system. To overcome this lack, developers consult other programmers with the hope of getting hints to overcome a problem they encountered, or to gather suggestions about design ideas they have. The research is often extended to online resources, such as tutorials and messaging boards. This practice is defined as crowdsourcing. It is not surprising that nowadays there are online communities in which developers collaborate to solve problems and programming issues or to discuss design ideas. Among the many resources available on the web, Questions & Answer (Q&A) services are gaining popularity (e.g., stackoverflow.com, Yahoo! Answers etc.) and crowdsourcing is becoming an usual practice. Even though the usage of Q&A services has dramatically increased, this new important resource has been scarcely taken advantage of by any Integrated Development Environment (IDE). Interacting with those communities requires developers to continuously switch between the IDE and the web browser to read the discussions and to then perform modification to the code, thus leading to interruptions in the programming flow that lowers the developers’ performance. We present Seahawk, an Eclipse plugin to integrate Stack Overflow crowd knowledge in the IDE. Seahawk allows developers to seamlessly retrieve Q&A from Stack Overflow in the IDE, link relevant discussions to any source code in a collaborative fashion by also attaching explanative comments, and to automatically generate queries from code entities.

Carol Alexandru: Facets of Software Evolution

SOFAS is a service oriented platform for analyzing software projects, which can be reached over the internet. It consists of several different services, each of which is able to analyze a different aspect of the source code, such as its structure, size and complexity as well as the quality of its design. The services produce raw data stored in RDF graphs and it is up to the user to process the data, for example to produce visualizations or to draw conclusions. The Facets application fills this gap by offering an easy to use web interface where people can submit the URL to their code repository, upon which Facets will start a complex work flow involving several SOFAS services to create a comprehensive analysis of the software project. Once the analysis is complete, the user can use a web browser to explore the results via a number of visualizations which offer an insight on several facets of software evolution: The large-scale shape of a project, the quality of its design, the metric properties of each and every entity of the source code and history-related information such as the changes in size and developer activity. While traditionally, developers are required to invest time and effort into the setup of analysis software and the preparation of analyses, Facets offers a simpler and more straight-forward approach for people to analyze their software projects with very little effort on their own part.

Eric Bouwers: Metric-based Evaluation of Implemented Software Architectures

In the past four years we have been working on increasing the objectivity and repeatability of evaluating the maintainability of implemented software architectures. In this presentation we provide a high-level overview of both our approach and our results. First, we explain why we want to perform this type of evaluations, after which we introduce a model for software architecture complexity. Based on this model we developed two architecture-level metrics aimed to quantify desirable characteristics of the componentization of a software system. We outline all steps of the validation of these metrics, starting from validating their construct validity up to the validation of the usefulness of these metrics in an industry setting.

Remmo Lemma: Software Modeling in Essence

The design of any object-oriented system starts with (software) modeling, a process during which one identifies the system's core concepts and how they relate to each other. Among the various ways of performing this activity, there are lightweight means, such as pen & paper/whiteboard or CRC cards, and on the other end of the spectrum we have complex full-fledged UML editors. The advantages of the first ones are their immediacy, speed and playful nature, i.e., their informal essence supports the creative modeling process, but their output is difficult to store, process, and maintain. The latter ones, while making amend for these problems, are tedious and often not trivial to use, thus hindering both creativity and productivity. We believe that there is a possible middle ground which maximizes the good of both worlds, while keeping the bad at bay. This middle ground is best treaded using the emerging technology of touch-based tablet computers. We propose a novel modeling methodology based on a minimalistic set of elements: the essence of object-oriented software modeling. We employ a simple, yet powerful, visual metaphor based on a matrix, which has been designed to explicitly leverage the modeling activity. We also present CEL, an iPad application which implements our modeling methodology. This tool has been designed to support the process of rapidly and interactively creating, manipulating and storing language independent software models. These models can then be used to generate the corresponding skeleton code in any language of choice.

Katja Kevic: Collaborative Bug Triaging Using Textual Similarities and Change Set Analysis

Bug triaging assigns a bug report, which is also known as a work item, an issue, a task or simply a bug, to the most appropriate software developer for fixing or implementing it. However, this task is tedious, time-consuming and errorprone if not supported by effective means. Current techniques either use information retrieval and machine learning to find the most similar bugs already fixed and recommend expert developers. Or they analyze change information stemming from source code to propose expert bug solvers. Neither technique combines the two and exploits the potential of the interlinking between bug reports and change sets. Our approach discovers potential experts by identifying similar bug reports and analyzing the associated change sets. Studies have shown that effective bug triaging is done collaboratively in a meeting, as it requires the coordination of multiple individuals, the understanding of the project context and the understanding of the specific work practices. Therefore, we implemented our approach on a multi-touch table to allow multiple stakeholders to interact simultaneously in the bug triaging and to foster their collaboration. In the current stage of our experiments we have experienced that the expert recommendations are more specific and useful when the rationale behind the expert selection is also presented to the users.

Christoph Bräunlich: Capturing and visualizing developer's context models

To complete a change task software developers spend more than a third of their time navigating code, implicitly building a context model - a model of the elements and relations relevant to the task. Since these context models stay implicit, developers forget relevant details and repeatedly revisit the same code elements and relations. I will present my approach to automatically capture and visualize a developer's context model, reducing his cognitive burden, eliminating many revisits, and ultimately making him more productive.

Shane Macintosh: Studying the Relationship Between Build Technology and Maintenance Effort

Maintaining software build systems, i.e., the system that specifies how source code is translated into deliverables, requires considerable development effort. When the maintenance effort of the build system grows unwieldy, software teams undertake build migration projects, where existing build specifications are often reimplemented using different build technologies. However, little is known about the relationship between build technology choice and maintenance effort. In this project, we mine version history recorded in a corpus of software forges, ecosystems, and large-scale projects containing more than 850,000 version control repositories in order to better understand the relationship between build technology choice and maintenance effort.

Boris Spasojevic: Large-Scale Software Analysis

Complex software systems generally exist within an even larger software ecosystem consisting of older versions of the systems, variants, and other client applications of the system or its parts. Being able to query and mine this large resource of information can be very helpful for developers to make informed decisions. Several approaches for using data mined from ecosystems are presented, as well as an overview of the data sources.

Tuesday 13.3 16:00 - 17:30

Eya Ben Charrada: Supporting Requirements Update During Software Evolution

The requirements specification includes much knowledge about the software system that is useful for program comprehension and maintenance. Therefore it is important to keep it reliable by regularly updating. In this talk, I propose an approach to support maintainers in updating requirements specifications by automatically identifying the parts that are outdated in them. I also explain why propagating changes backwards from code to requirements can be less expensive than propagating changes in the forward direction.

Veronika Bauer: Facts and fallacies of reuse in practice

Code reuse is claimed to greatly improve software development. However, anecdotal evidence indicates that industry is not yet experiencing the expected benefit. In my research I am investigating this phenomenon and examine ways to improve reuse in practice. To achieve this goal, I proceed as follows: I empirically assess the state of the practice in several companies via interviews and an online survey. Based on the results, I will extract success factors as well as hindrances to reuse with a focus on company-wide code reuse. These findings will be the input to a comprehensive structured assessment model, which serves the purpose to measure the state of reuse and to monitor the effects of changes in reuse behaviour. Lastly, I will provide a process which should help to deduce improvement strategies from the outcome of a reuse assessment.

Andrei Chis: Meta-Tooling

Answering developer's questions has become a difficult problem due to the sheer size and complexity of today’s software systems. No longer can this be accomplished without intensive use of tool support. However, given the highly heterogenous and contextual nature of these questions, relying just on general purpose tools does not lead to a one to one mapping between questions and answers. To overcome this limitation, we propose meta-tools, i.e. tools for building custom tools that can directly answer contextual and domain-specific questions. In this talk we show how the concept of a meta-tool can be applied to rethink the current debugger from Pharo Smalltalk.

Laura Morena: Automatic Generation of Natural Language Summaries for Classes

Several software development tasks require that developers understand parts of the source code of the system they are dealing with. In order to determine whether a piece of code is relevant for the current task, developers rely on their knowledge and understanding of that code. In the case of unfamiliar code, developers rely on internal and external documentation. Unfortunately, this documentation is often absent or outdated. This talk presents the challenges of automatically describing classes in Object-Oriented systems. Furthermore, it presents a novel alternative to face such challenges based on code analysis and natural language summarization techniques.

André Meyer: Happy Coder - A Personal Analytics Tool

Recent advances in small inexpensive sensors, cheaper storage, and low-power processing cause an increasing popularity of trackers that quantify a user's activities throughout his everyday life. The Fitbit and the Nike+ Fuelband are two examples of commercial approaches that motivate a user to be more active by tracking his activity and visualizing the analyzed data. In the area of software engineering there are similar tools to support a developer in a single domain of his work, such as planning tools or bug repositories. Only little research has been performed on how to integrate the available data and how to focus on providing a retrospection of a developer's work day. In order to contribute to overcome this shortcoming we introduce a tool, Happy Coder that provides developers with a retrospective analysis of their work day, by tracking predefined metrics and visualizing them on a web client. This includes a front-end with consolidated data analysis, visualizations and representations of the collected data. Two studies revealed that developers assess their productivity based on a personal evaluation of their work day. This assessment is dominated by personal preferences of different metrics like work items, meetings, web searches or activities on the computer. In this talk, we present related work, interesting findings of our studies and Happy Coder.

Sebastian Müller: Crosscutting Information Needs in Software Development Activities

Stakeholders in software development continuously seek and consult various information artifacts to successfully complete their activities. In our empirical study, we found that these information needs exhibit a crosscutting nature with respect to stakeholder role, activity, information artifacts and even fragments of artifacts and that even for the same activity, different stakeholders will use different fragments of information artifacts. Due to this variety of information artifacts and fragments that are used by different stakeholders for performing their predominant activities, the linkage between information artifacts is important. While some information artifacts can be explicitly linked, we observed in our study that the linkage is often out of date, unreachable or not available at all.

Lee Martie: Toward Social-Technical Code Search

With the vast amount of source code that is publicly available today, searching for code has become an integral part of the programming experience. While a few dedicated code search engines are available, we contend that they have not nearly reached their full potential. Particularly, we believe that it is necessary for code search engines to not merely index code, but also construct a rich network of social-technical information that surrounds that code. With such information, much richer queries can be issued and novel interfaces can be built through which the results of such queries can be explored more intuitively. We make the case for social-technical code search, introduce six categories of social-technical information and how it would enhance search, and introduce CodeExchange, our early prototype platform we are developing to explore social-technical code search.

Ahmed Lamkanfi: Qualitative Data Analysis to Guide Mining Software Archives

The investigation and mining of software archives has proven to be successful in supporting software developers in their daily activities. However, such large data sets often contain irregularities and anomalies which significantly affect the analysis we can perform on them. In this talk, we demonstrate several conspicuous observations retrieved from analyzing different bug databases. Furthermore, we argue why these conspicuous observations are interesting to consider when performing MSR studies. Complementary, we discuss ways of coping with such irregularities in software archives. For this purpose, we use the bug data sets of two major open-source projects, namely Eclipse and Mozilla.

Oliver Denninger: Recommending Code Artifacts for Change Requests Using Multiple Predictors

Finding code artifacts affected by a given change request is a time-consuming process in large software systems. Various approaches have been proposed to automate this activity, e.g., based on information retrieval. The performance of a particular prediction approach often highly depends on attributes like coding style or writing style of change request. Thus, using multiple prediction approaches in promising. However, the results of each approach need to be weighted to form an overall result set. First experiments show that machine learning is well suitable to weight different prediction approaches for individual software projects and hence improve recommendation performance.

Anja Guzzi: Facilitating Enterprise Software Developer Communication with CARES

When software developers need to exchange information or coordinate work with colleagues on other teams, they are often faced with the challenge of finding the right person to communicate with. In this paper, we present our tool, called CARES (Colleagues and Relevant Engineers’ Support), which is an integrated development environment-based (IDE) tool that enables engineers to easily discover and communicate with the people who have contributed to the source code. CARES has been deployed to 30 professional developers, and we interviewed 8 of them after 3 weeks of evaluation. They reported that CARES helped them to more quickly find, choose, and initiate contact with the most relevant and expedient person who could address their needs.

Alberto Bachelli: Mining Structured Data in Natural Language Artifacts with Island Parsing

Software developers are nowadays supported by a variety of tools (e.g., version control systems, issue tracking systems, and mailing list services) that record a wide range of information into archives. Researchers mine these archives (or software repositories) both to support software understanding, development, and evolution, and to empirically validate novel ideas and techniques. Software repositories comprise two types of data: structured data and unstructured data. Structured data, such as source code, has a well-established structure and grammar, and is straightforward to parse and use with computer machinery. Unstructured data, such as documentation, discussions, comments, customer support requests, comprises a mixture of natural language text, snippets of structured data, and noise. Mining unstructured data is hard, because out-of-the box approaches adopted from related fields such as Natural Language Processing and Information Retrieval cannot be directly applied in the Software Engineering domain. In my work I focus on mining unstructured data, which gives us the chance to gain insights on the human factors revolving around software projects, and can complement information extracted from structured data repositories. In particular, I focus on the communication occurring among people involved in a software project. In this presentation, I will detail our approach---based on island parsing---to recognize, parse, and model fragments of structured information (e.g., source code snippets) embedded in natural language artifacts. We implemented our approach exploiting a number of parsing methodologies to achieve accuracy and efficiency (by using Parsing Expression Grammars), flexibility (by using a scannerless parser), and extensibility (by adopting parser combinators). We evaluated our approach by applying it to discussions taking place in StackOverflow. The results show that our approach allows the extraction of structured data in natural language documents with high precision and recall. I will discuss how the presented approach has been used in other software engineering research scenarios.

Sascha Just: Mozkito—A general purpose mining and analysis framework

Empirical software engineering is mostly project specific and replicating studies is key for general acceptance and improvement. Lately, there has been a lot of discussion on standardized datasets and mechanisms to allow easier replication studies [1]. There are circumstances that will not allow you to publish your data due to confidence reasons. Still, studies can be reproducible even without having the used data sets at hand, but this would require sharing code and scripts used to conduct the experiments. Re-implementing textural descriptions of algorithms will always lead to modification of the algorithm and thus will lead to close but not exact replications. Our intention is to provide a way to normalize the process rather than the data and develop a framework that people can implement their algorithms with and publish them as well as the data. This will not only allow comparing of results and performances, but also enable people to access and combine algorithms for new projects. Our open-source mining framework mozkito already provides libraries and ready-to-use tool chains to mine, relate and analyse data from several software archives.

[1] Gregorio Robles, Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings, In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), 171–180. IEEE.

Annie Ying: Code Fragment Summarization and Personalization

When a programmer uses general search engines to find code examples, the information accompanying a returned link does not always contain adequate cues for determining whether the link is worth-while to pursue. To mitigate this issue, we introduce code fragment summaries. Generating summaries using a machine learning approach on AST and query based features achieved a precision of 0.7 against human generated summaries. Summaries with this level of precision achieved the same level of agreement as human annotators with each other. We are also interested in personalizing code fragment summaries based on API expertise.

Weiyi (Ian) Shang: Bridging the Divide between Software Developers and Operators using Logs

There is a growing gap between the software development and operation worlds. Software developers rarely divulge development knowledge about the software to operators, while operators rarely communicate field knowledge to developers. To improve the quality and reduce the operational cost of large-scale software systems, bridging the gap between these two worlds is essential. This thesis proposes the use of logs as mechanism to bridge the gap between these two worlds. Logs are messages generated from statements inserted by developers in the source code and are often used by operators for monitoring the field operation of a system. However, the rich knowledge in logs has not yet been fully used because of their non-structured nature, their large scale, and the use of the ad hoc log analysis techniques. Through case studies on large commercial and open source systems, we plan to demonstrate the value of logs as a tool to support developers and operators.