Loading...

Cross-project code clones in GitHub

Gharehyazie, M ; Sharif University of Technology | 2018

811 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s10664-018-9648-z
  3. Publisher: Springer New York LLC , 2018
  4. Abstract:
  5. Code reuse has well-known benefits on code quality, coding efficiency, and maintenance. Open Source Software (OSS) programmers gladly share their own code and they happily reuse others’. Social programming platforms like GitHub have normalized code foraging via their common platforms, enabling code search and reuse across different projects. Removing project borders may facilitate more efficient code foraging, and consequently faster programming. But looking for code across projects takes longer and, once found, may be more challenging to tailor to one’s needs. Learning how much code reuse goes on across projects, and identifying emerging patterns in past cross-project search behavior may help future foraging efforts. Our contribution is two fold. First, to understand cross-project code reuse, here we present an in-depth empirical study of cloning in GitHub. Using Deckard, a popular clone finding tool, we identified copies of code fragments across projects, and investigate their prevalence and characteristics using statistical and network science approaches, and with multiple case studies. By triangulating findings from different analysis methods, we find that cross-project cloning is prevalent in GitHub, ranging from cloning few lines of code to whole project repositories. Some of the projects serve as popular sources of clones, and others seem to contain more clones than their fair share. Moreover, we find that ecosystem cloning follows an onion model: most clones come from the same project, then from projects in the same application domain, and finally from projects in different domains. Second, we utilized these results to develop a novel tool named CLONE-HUNTRESS that streamlines finding and tracking code clones in GitHub. The tool is GitHub integrated, built around a user-friendly interface and runs efficiently over a modern database system. We describe the tool and make it publicly available at http://clone-det.ictic.sharif.edu/. © 2018, Springer Science+Business Media, LLC, part of Springer Nature
  6. Keywords:
  7. Clone detection ; Cross-project cloning ; Deckard ; GitHub ; Cloning ; Codes (symbols) ; Computer software reusability ; Open source software ; Coding efficiency ; Different domains ; Multiple-case study ; Social programming ; User friendly interface ; Open systems
  8. Source: Empirical Software Engineering ; 2018 ; 13823256 (ISSN)
  9. URL: https://link.springer.com/article/10.1007/s10664-018-9648-z