Evolving Code Clones


finished by Emanuel Giger

More information can be found in the project plan

Final Thesis (PDF, 1594 KB)


Code clones have long been recognized as bad smells in software systems and are assumed to cause maintenance problems during evolution. Recent research activities on evolving code clones have shown that a relation between code clones and change couplings (co-changing source code files) cannot be verified. Since change couplings are calculated on the level of files, it is unverifiable (automatically) whether change couplings are due to changes in the cloned source code fragments.

The goals of this diploma thesis

The goal of this diploma thesis is first to extract those change couplings due to changes in cloned source code fragments using fine-grained source code change analysis. Further it has to be investigated if existing code clone and change coupling relation analysis can be improved with the fine-grained coupling information. The methodology should be evaluated with the Mozilla project and an other C++ or Java project.

Task description

  • Extending our software evolution framework (Eclipse plugin http://www.eclipse.org) with code clone information obtained by CCFinder. We already have a CCFinder output parser, which has to be integrated into our framework.
  • Implementing an existing abstract syntax tree (AST) differencing algorithm for the C++ language. The algorithm has to be integrated in our framework using the C/C++ development tools (CDT) of Eclipse (http://www.eclipse.org/cdt/) to extract fine-grained changes of C++ projects, e.g., Mozilla.
  • Extract change couplings due to changes in cloned source code fragments using our software evolution framework.
  • Investigate the relationship between code clones and change couplings using statistical analysis techniques (e.g., regression analysis).
  • Evaluate the approach with a C/C++ and a Java open source project (e.g., Mozilla, ArgoUML).

Technologies to be used

  • CCFinder; code clone detection tool
  • Graph/tree differencing algorithms
  • Java Development Tools (JDT http://www.eclipse.org/jdt/) C/C++ Development Tools (CDT) of Eclipse
  • Hibernate http://www.hibernate.org; object/relational mapping tool
  • Release History Database (RHDB)

The envisioned outcome

  • An Eclipse plugin extending our framework with code clone information and C/C++ AST differencing algorithm.
  • Improved investigation of the relation between code clones and change couplings.
  • Validation of the methodology.

General thesis guidelines

The typical rules of academic work must be followed. "So what is a (Diploma) Thesis" describes guidelines which must be followed. At the end of the thesis, a final report has to be written. The report should clearly be organized, follow the usual academic report structure, and has to be written in English using our s.e.a.l. LaTeX-template.

Since implementing software is also part of this thesis, state-of-the-art design, coding, and documentation standards for the software have to be obeyed.

The diploma thesis has to be concluded with a final presentation for the members of the Software Evolution and Architecture Lab (s.e.a.l.).


Beat Fluri, Prof. Harald Gall

More information on "What is a Diploma Thesis and How to do a Diploma Thesis at IFI" is provided here.