Loading...

Categorization of Software Repositories in Version Control Systems

Nejati, Mahtab | 2021

324 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 54184 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Heydarnoori, Abbas
  7. Abstract:
  8. Developers often seek to find projects that match their topics of interest in version control systems with the goal of reusing code, extracting similar requirements, learning, and more. However, due to the widespread use of these systems, the number of projects maintained on them is huge and ever-increasing. This has made it difficult to identify projects based on their topics. GitHub, as a prominent version control system, with the aim of facilitating access to projects based on their topics, has provided the possibility of assigning software topics to projects in the form of free-text tags for users. Assigning a correct and complete set of topics to software projects allows programmers to easily navigate projects by screening projects based on those topics.Previously, several studies have been conducted to facilitate the navigation of software projects. These studies began with approaches to grouping software projects and searching among them. Then, by identifying the positive effects of topic labeling of software projects on navigation, search, and categorization operations, as well as by enabling such labeling, research in this area continued with the project labeling approach. Studies with basic approaches are often dependent on technology and the programming language. On the other hand, labeling-based approaches mostly have results with limited recall and low precision. This low precision indicates the weakness of these approaches in predicting missing topic tags. Due to the importance of completeness of topic labels assigned to projects, this weakness has attracted the attention of researchers.In the present study, we categorize existing projects on the GitHub platform based on software topics through tagging, focusing on predicting missing tags. In order to achieve this, we first create a knowledge graph of GitHub-approved software topics through a crowd-sourced process. Then, using machine learning methods and information retrieval methods applicable to semantic networks, we present a knowledge graph-based software topic recommendation system with the aim of assigning the most complete set of possible topics to each software project. The results of the evaluations show that our proposed approach has an accuracy of 72.8%, which has significantly improved compared to the state-of-the-art methods and the results of previous studies. Our proposed method has outperformed the state-of-the-art approach by 62.28% and 93.14% regarding mean success rate and means average precision, respectively
  9. Keywords:
  10. Version Control System ; Knowledge Graph ; Machine Learning ; Classification ; Labeling Algorithm ; Software Topics ; Software Projects

 Digital Object List

 Bookmark

...see more