Research

Current Focus Areas of ZEST

Software Quality

Software quality measures to what extent a software program scores with respect to non-functional attributes such as reliability, efficiency, security, maintainability, and size. In the past, poor source code quality has been shown to cause several critical issues such as (i) lowering developers' productivity, (ii) lowering program comprehensibility, and (iii) increasing the risk of fault introduction. It is therefore crucial devising approaches and tools to support developers with managing and improving source code quality.

In our research group, we are mostly interested in how to support developers in two aspects related to software quality, i.e., peer code review and technical debt management.

peer code review

 

Peer Code Review. One of the most popular practices to improve the overall quality of software is code review, i.e., the manual assessment of source code by reviewers other than the author. In contrast with studies providing evidence that code review is potentially one of the most effective practices to increase software

quality, we found that it often leads to poor results in practice. The reason for this unmet potential is that contemporary code review is a fully manual task whose efficacy relies only on the zeal of reviewers.

technical debt management

 

Technical Debt Management. Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". The metaphor explains well the trade-offs between delivering the most appropriate but still immature product, in the shortest time possible. One important factor contributing to technical debt is represented by the so-called bad code smells, i.e., symptoms of the presence of wrong design or implementation choices applied by programmers during the development of software systems. The challenges of our group are that of (i) defining novel techniques for the automatic identification and removal of design problems in both production and test code, (ii) empirically understanding the relations between code/test smells and external factors (e.g., development community-related factors, source code performance, etc.).

Fundamentals of Software Analytics

Software analytics is the use of data science on data generated during the development and execution of software projects. This is a recent field that is rapidly growing and getting traction. Challenges researchers face not only regard the contexts to which software analytics should be applied, but also advancing the state of the art to improve its scope and effectiveness. As such, it is part of the ZEST vision to contribute with research on the fundamentals of software analytics. Particularly, we are focusing on:

predictive analytics

 

Predictive Analytics. Predicting the areas of source code that are more likely to be problematic in the near future is a key challenge in software engineering in order to allow developers to preventively plan maintenance operations. More specifically, our research group is mainly interested in creating and evaluating supervised techniques based on Machine Learning (ML) to predict (i) the fault-proneness of source code classes and methods with the aim of prioritising testing activities and (ii) the change-proneness of source code classes with the aim of recommending to developers the application of specific maintenance operations, e.g., refactoring actions.
mining unstructured software data

 

Mining Unstructured Software Data. Even though most software analytics techniques target artifacts with a clear, parseable structure, such as the source code, the majority of software content is unstructured data: documents, such as emails or change comments, written in natural language and used to exchange information among people. Our goal is twofold: (1) Widening the pool of sources we can consider  and (2) advancing software analytics with novel techniques. Our aim is to investigate the real value of sources and to adapt techniques from fields that more commonly deal with this form of data, such as information retrieval, natural language process, and text mining, and integrating with specific techniques, such as island parsing.

large-scale API/usage mining

 

Large-scale API/Usage Mining. Up to 80% of the source code in today's software applications comes from existing public software components. Despite the key role of public components in software development, only very limited support is available to search, analyze, and understand the intricate network of public components available nowadays. Our goal is to fill this gap by conducting empirical investigations and devising a data-driven assessment of public software components and language features, by enabling software analytics on the  huge mass of information on public components, their usage, and their consumers that is available in various publicly available repositories of both structured (e.g., Maven and GitHub) and unstructured software data (e.g., Stack Overflow). From these data sources, ZEST focuses on generating a set of carefully calibrated indicators that provide a comprehensive and balanced assessment of components and language/systems' features.