finished by Tobias Sager.
In recent years, various attempts have been made to "measure" if two pieces of software from different packages or from subsequent releases of the same software package are similar or not. One might imagine a tool that ?magically" detects similar pieces of software stored in a huge software repository while the programmer implements new software. In that way, time is saved and software reusability is improved. Another application is software plagiarism detection where the principal task is to find (illegally) copied software. Determining software similarity also plays an important role in software evolution. When the task is to detect how software evolved over a certain period of time, software similarity is one source of information to get a better picture of the evolution process. Especially the approach of combining similarity measures and logical coupling measures seems to be very promising after all.
SimPack is a library of similarity measures. It implements a set of similarity measures along with a set of data wrappers. The wrappers are used to apply the measures to a specific data format.
The goal of this diploma thesis is to implement the Coogle-System (Code-Google) which serves as search engine for software source code. Coogle makes use of a large software repository and a bunch of similarity measures implemented in SimPack to perform similarity measurements.
The thesis consists of four major work packages and should result in an Eclipse plugin to measure similarity between different pieces of software based on SimPack. A detailed evaluation of the implemented system concluds the thesis and addresses the connection between software similarity and software evolution.
The thesis may be summarized as follows:
- Java source code data wrapper implementation for SimPack
- Designing a source code repository
- Comparison with logical coupling calculations