Loading...

Predicting Usefulness of Code Review Comments Using Machine Learning Algorithms

Mohammadi, Atefeh | 2019

1385 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 52509 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Fazli, Mohammad Amin
  7. Abstract:
  8. The competition for staying in the business world has intensified today with the rise of open-source and commercial software. As long as a software is tailor-made to suit the needs of users, it is so-called alive and can stay in the competition. So the maintenance phase is necessary to make changes to the software to meet the needs of users. To reduce costs associated with this phase, it is necessary to avoid software bugs. One way to avoid software bugs is to use peer code review. Peer code review has been recognized as one of the best software engineering principles of the last 35 years. This principle helps maintain the quality of the code due to changes made to parts of the code that need to be integrated with the code repository. This is done by early detection of code defects and breaches of coding standards in the early stages of development. Studies show that a significant amount of these comments are not useful, which means they do not cause changes to the code; Therefore, in this study, considering the significance of this issue first, extracting the factors affecting the quality of code review comments was done based on previous research. The extracted factors fall into two categories: factors related to the developer experience and textual features of the comments. Then, due to the lack of appropriate data set that includes these factors, an appropriate dataset was collected. Then, a useful predictive model was implemented using the XGBoost algorithm and the performance of this method was compared with other methods. Finally, the method presented in this study is compared with other works done in this field. The results show that the proposed method considering the two completely separate datasets and according to precision, recall and F1-score criteria about three percent and according to accuracy criterion about one percent performs better than the only available method
  9. Keywords:
  10. Software Maintenance ; Machine Learning ; Peer Code Review ; Textual Features ; Developer Experience

 Digital Object List

 Bookmark

...see more