Recommender System for Libraries Update of Client Projects

Introduction

Software ecosystems consist of multiple software projects, often interrelated each other by means of dependency relations. When one project undergoes changes, other projects may decide to upgrade the dependency. For example, a project could use a new version of another project because the latter has been enhanced or subject to some bug-fixing activities.

In the work [1] we observed the evolution of the Java subset of the Apache ecosystem, consisting of 147 projects, for a period of 14 years, and resulting in 1,964 releases. Specifically, we analyze (i) how dependencies change over time, (ii) whether a dependency upgrade is due to different kinds of factors, such as different kinds of API changes or licensing issues, and (iii) how an upgrade impacts on a related project. Results of this study help to comprehend the phenomenon of library/component upgrade, and provides the basis for a new family of recommenders aimed at supporting developers in the complex (and risky) activity of managing library/component upgrade within their software projects.

Research ideas/projects: the purpose of this project is to implement a tool based on Summarization Techniques [2, 3] that analyzing source code changes occurred between version of software libraries allows developers to make decisions on libraries upgrade of client projects.

         Related literature:

      [1] “The Evolution of Project Inter-dependencies in a Software Ecosystem": The Case of Apache. ICSM 2013: 280-289

      [2] “Automatic Generation of Release Notes”

      [3] “On Automatically Generating Commit Messages via Summarization of Source Code Changes”

Tasks Description:

The main tasks of the project are:

  • Gathering data:

    • extract data and metadata of projects of the Apache Software Foundation (ASF);

    • find other software ecosystems rely on https://www.openhub.net/

  • Build machine learning models for libraries update recommendation:

    • Identify possible relevant facts for the libraries update decisions (e.g., diff between versions of the same library, size or complexity of the changes, etc.)

      Experiment with different machine learning algorithms (Gradient Boosted Trees, Nearest Neighbors, etc.);
    • Experiment with different sets of features to recommend the correct decision (e.g., update to the next version or not);
    • Use ML to find the most suitable version for libraries update;

  • Build an intuitive UI that uses the defined models to make different recommendations

  • Evaluation:

    • precision/recall/f1-scores per category from a test set of the projects data and metadata;

  • Scope: 2/3 students.

Detailed description