Improving Abstract Syntax Tree based Source Code Change Detection

Status

done by Michael Würsch

More information can be found in the project plan

Introduction

ChangeDistiller, our source code change extraction Eclipse plugin, is based on the approach of Chawathe et al.. The algorithm finds tree edit operations between two intermediate abstract syntax trees based on a matching set of tree nodes. Its outcome is an edit script transforming the first into the second tree. The edit script is not always minimal; in particular, because 1) the matching algorithm is imprecise for small sub-trees; and 2) the algorithm uses a heuristics that changes between two trees are small.

The goals of this diploma thesis

The goal of this diploma thesis is to improve the matching algorithm of ChangeDistiller and make its edit script detection more flexible concerning the detection of large changes. For that, the matching algorithm has to be improved by using different similarity measures or combinations of them. To investigate accuracy (precision, recall), efficiency, and scalability of the new algorithm a benchmark as JUnit Eclipse plugin test cases has to be developed.

Task description

  • Get familiar with ChangeDistiller.
  • Evaluate which tree and string similarity measure provide by SimPack are suitable for the matching algorithm. Try combinations of them and/or develop new ones which have to be integrated into SimPack.
  • Build a set of test classes as a benchmark to investigate accuracy, efficiency, and scalability of the improved ChangeDistiller.
  • Perform medium-sized Java case studies (e.g., ArgoUML or Azureus) to validate the applicability of the improved ChangeDistiller.

Technologies to be used

The envisioned outcome

  • Improved ChangeDistillerconcerning accuracy, scalability, efficiency, and minimality of the edit script.
  • A benchmark for source code change detection.
  • Extended SimPack, if suitable generic tree or string measures were developed.

General thesis guidelines

The typical rules of academic work must be followed. "So what is a (Diploma) Thesis" describes guidelines which must be followed. At the end of the thesis, a final report has to be written. The report should clearly be organized, follow the usual academic report structure, and has to be written in English using our s.e.a.l. LaTeX-template.

Since implementing software is also part of this thesis, state-of-the-art design, coding, and documentation standards for the software have to be obeyed.

The diploma thesis has to be concluded with a final presentation for the members of the Software Evolution and Architecture Lab (s.e.a.l.) and the Dynamic and Distributed Information Systems Group (ddis).

Advisor

Beat Fluri, Prof. Harald Gall, Christoph Kiefer.

More information on "What is a Diploma Thesis and How to do a Diploma Thesis at IFI" is provided here.