The advances in version control systems (VCSs) have revolutionized the way we
develop software. Today, almost all software developed by a single person or
a group of programmers is stored in VCSs. A large variety of VCSs, including
subversion (svn) and concurrent version system (cvs), are available for managing
versioned software and many open source projects on sourceForge and google
code are available for analysis and use. VCSs are general-purpose systems that
allow to store and retrieve textual and binary data files. Methods have been
implemented to optimize the storage, mostly by storing compressed versions
of differences between versions instead of individual versions. However, VCSs
support only basic version retrieval functionality and it is difficult to retrieve in-
formation specific to the grammar of the source code (e.g., retrieve all functions
without parameters) or information that depends on multiple versions (e.g., all
functions that have increased the number of parameters).
In Qvestor we are designing, implementing, and evaluating a software system to query and analyze versioned software. The main goal is to make the grammar that describes the source code available through a declarative query language (SQL), which allows the user to formulate declarative queries over such data. This is achieved by annotating the grammar with queries in the underlying query language of the database system. Qvestor translates the annotations into a query that the user can use to query the repository. This way users can query and statistically evaluate their work using well known technology such as SQL while being able to include in the queries the semantics carried in the grammar. Moreover, the user will be able to issue queries over all versions of her project and, for instance, to select or analyse only subsets of the versions. The foundation of the work consists of rules that modify the user’s queries (expressed in annotations of the grammar) so that, e.g. source code versions that cannot be part of the result are excluded upfront from further querying.
We will devise a general algebra consisting of a small set of operators that can be used to formulate declarative queries on sequences of versioned data following a grammar specification.