MSR 2014 May 31–June 1. Hyderabad, India
The 11th Working Conference on Mining Software Repositories

Challenge Chair

Olga Baysal

University of Waterloo

Mining Challenge

The International Working Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge we call upon everyone interested to apply their tools to bring research and industry closer together by analyzing a common data set. The challenge is for researchers and practitioners who bravely put their mining tools and approaches on a dare.

This year, the challenge is on the GitHub data. We provide the data for the GitHub repository and you should use your brain, tools, computational power, and magic to uncover interesting findings related to it.

How to Participate in the Challenge

Participating in the challenge requires you to:

1. Download the data (last updated on October 10, 2013 to include commit comments), available in two forms - MySQL dump (106 MB) or MongoDB dump (2.7 GB).

The dataset contains 90 GitHub projects and their folks that are not randomly selected and thus not representative of GitHub.

Before you start your mining, please refer to the database schema, the data description and instructions for importing and using the data.

2. Report your findings in a four-page document.

3. Submit your report on or before, February 21, 2014

4. If your report is accepted, present your awesome findings at MSR 2014!

Challenge Data

This year, the focus of the challenge is the GitHub data. GitHub is a web-based service providing a collaborative software development environment and a social network for developers. We provide you with the dataset extracted from GHTorrent by Georgios Gousios.

When you use the data provided by the MSR 2014 Challenge, we ask you to cite it as following:

Challenge Report

The challenge report should describe the results of your work by providing an introduction to the problem being addressed, the data used, the approach and tools used, your results and their implications, and conclusions. Keep in mind that the report will be evaluated by a jury. Make sure your report highlights the contributions and the importance of your work.

Challenge reports must be at most 4 pages long and must conform at time of submission to the ICSE (and MSR) 2014 Format and Submission Guidelines.

Submission Details

Submit your challenge report (maximum 4 pages) to EasyChair on or before February 21, 2014. Please submit your challenge reports to the "MSR 2014 Challenge Track". Author notification and camera-ready dates are going to be March 10 & March 20, 2014 respectively.

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere during the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission.

Upon notification of acceptance, all authors of accepted papers will be asked to complete an ACM Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR 2014 conference. All accepted contributions will be published in the conference electronic proceedings.


This year's prize for the best Mining Challenge will be an iPad Mini Retina in an exclusive custom case with a GitHub logo on it, sponsored by GitHub. All presenters of their Challenge work at the conference will receive a 1-year of 'small' service credit ( and a GitHub t-shirt.


We would like to thank Georgios Gousios from the Delft University of Technology for providing GHTorrent data.