Automatically Structuring and Tagging Bug Reports in Version Control Systems<br/>

Nikeghbal, Nafiseh; Heydarnoori, Abbas

Please enable javascript in your browser.

Automatically Structuring and Tagging Bug Reports in Version Control Systems

Nikeghbal, Nafiseh | 2023

97 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 56496 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Heydarnoori, Abbas
Abstract:
GitHub issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engineering tasks like submitting bugs and requesting features. In the initial versions of issue reports, there was no standard way of using them. As result, the quality of issue reports varied widely. To improve the quality of issue reports GitHub introduced issue report templates (IRTs). However, despite of effectiveness of this feature which was introduced in 2016, only nearly 5% of GitHub repositories (with more than 10 stars) utilize it. Although issue report templates have advantages, their infrequent use requires the development of an automated way to create them. In this research, 1,084,300 GitHub code repositories were crawled, and data related to issue report formats were extracted. After reviewing past research on handling issues and organizing them, as well as studies about instructions-tuning on large language models, we chose the T5 model as the starting point for automatically creating issue templates. In this way, for fine-tuning of this model, the data were structured in the form of instruction-output pairs. After fine-tuning, the trained model was compared with the Flan-T5 model. The presented model showed a better performance in machine evaluation by 11.86% in the Rouge-1 criterion, 12.92% in the Rouge-L criterion, and 6.39% in the Meteor criterion than the Flan-T5 model. In the human evaluation of the model, the evaluators were asked to rate the templates generated by two models (the model proposed in this research and the Flan-T5 model) on a scale of 1 to 5 in three different aspects: structure, coherence and following instructions. In this evaluation, with a difference of at least 2.2 out of 5 in all three aspects, the proposed model performed better. Additionally, when evaluating the model’s effectiveness, human evaluators assessed the issue report templates created using the model and gave them an average ranking of 2.1 out of 4. In contrast, issue report templates created without using the model received a higher average ranking of 2.9 out of 4 (here, a lower score indicates better performance). As a result, it can be said that the model has been effective and useful in practice, and the developers were able to generate higher-quality issue report template using it
Keywords:
Issue Reports ; Language Model ; Automatic Tagging ; Issue Report Templates ; Version Control System

Digital Object List

محتواي کتاب
view

Bookmark

Friend's email
Your name
Your email
enter code