FAMIXDiff — a Context Aware Algorithm to Extract Structural Changes Between FAMIX Models

Status

finished by Andres Petralli.

Introduction

Developing and maintaining successful software systems is mastering complexity and change. In this thesis we focus on mastering change by developing methods and algorithms to observe source code changes between two releases of a software system. The thesis is a follow-up project of ChangeDistiller, an approach for extracting fine-grained changes from source code. ChangeDistiller uses the AST representation of two Java classes and an adopted tree differencing algorithm by Chawathe et al.. While ChangeDistiller is able to find a minimal set of fine-grained source code changes it misses context information. For example, it extracts the insertion of a method call but misses the class to which the called method belongs to. Such context information is needed to analyze the evolution of the implementation and design of software systems.

The goals of this diploma thesis

The goal of this thesis is 1) to develop a tree/graph differencing algorithm to extract the structural changes between two FAMIX models; and 2) to visualize extracted changes with our DA4Java tool.

Task description

The first task is to get familiar with the software evolution domain with focus on the FAMIX model and algorithms to extract differences between source code representations of Java classes. Our previous work on ChangeDistiller, UMLDiff approach by Xing et al., and JDiff by Apiwattanapong et al. provide good starting points. In addition, you will investigate the state-of-the-art in visualizing source code changes and get familiar with our DA4Java visualization approach.

The second task is concerned with developing the FAMIXDiff approach. The first step of this task is to define an adequate representation of FAMIX models of two source code releases that facilitates the extraction of changes. According to our experience with the ChangeDistiller algorithm this is an important step that will lay out the basis for the performance of FAMIXDiff. Next, you will evaluate different tree differencing algorithms and string similarity measures to find the most suitable ones. The evaluation goes hand in hand with the implementation of a prototype of FAMIXDiff, preferable as an Eclipse plugin and integrated into our Evolizer platform. You will also build a set of test classes as a benchmark to investigate accuracy, efficiency, and scalability of FAMIXDiff approach.

The third task is concerned with integrating an approach to visualize extracted changes into our DA4Java tool. Concerning the evaluation of the visualization select a Java open source project, such as, the Eclipse or Azureus project, and visualize the changes between FAMIX models of two releases.

The deliverables are as follows:

When What
end of 1st month State-of-the-art report in fine-grained source code change extraction.
end of 2nd month Model to represent FAMIX models of two source code releases for graph/tree differencing algorithms and an Eclipse plugin to created such models.
end of 4th month Prototype of FAMIXDiff together with a set of test classes as a benchmark.
end of 5th month Visualization of extracted changes integrated into the DA4Java tool.
end of 6th month Finished Master Thesis.

General thesis guidelines

The typical rules of academic work must be followed. "So what is a (Diploma) Thesis" describes guidelines which must be followed. At the end of the thesis, a final report has to be written. The report should clearly be organized, follow the usual academic report structure, and has to be written in English using our s.e.a.l. LaTeX-template.

Since implementing software is also part of this thesis, state-of-the-art design, coding, and documentation standards for the software have to be obeyed.

The diploma thesis has to be concluded with a final presentation.

Advisor

Dr. Martin Pinzger, Michael Würsch, Prof. Harald Gall.

More information on "What is a Diploma Thesis and How to do a Diploma Thesis at IFI" is provided here.