A large-scale analysis of bioinformatics code on GitHub (Pamela H. Russell, Rachel L. Johnson, Shreyas Ananthan, Benjamin Harnke, Nichole E. Carlson)

This would be a good article to cite if I need statistics on

  • number of articles associated with code repos year-on-year
  • statistics regarding repos and teams on GitHub
  • community / external contributors
  • gender breakdown in bioinf paper authorship
  • length and quality of commits and repos.

Publishing commits after the paper is a very interesting metric…

We looked at the simple binary feature of whether any commits were contributed to each repository after the associated article appeared in PubMed. …. However, interestingly, the association with the proportion of commits contributed by outside authors was not statistically significant, suggesting that overall team size may be the principal feature driving the relationship with the number of outside commit authors. Additionally, the metric was associated with frequency of citations in PubMed Central, which could indicate that people are discovering the code through the paper and using it, and the code is therefore being maintained.