I often find myself searching for certain commits using git log and friends. While I really love
the power and flexibility that come with the git and other Unix command-line tools, sometimes it can be more
convenient to use a database to filter and aggregate commit data.
I gave it a quick try yesterday and imported the commit history of ArangoDB’s Git repository into ArangoDB
and ran some queries on the data. While the query results for our repository may not be interesting for everyone,
I think it is still worth sharing what I did. Even though I didn’t try it, I think the overall procedure is
applicable with any other Git repository.
Converting the Git history to JSON
The way to extract history and commit data from a local repository is to use git log. Though its output
is greatly customizable, it does not provide an out-of-the-box solution for producing JSON. So I wrote a simple
wrapper script (in PHP) around it. The script can be found here.
Here’s how to run it:
converting the git history to JSON
123
cd path/to/local/git-repository
wget https://gist.githubusercontent.com/jsteemann/65ef221646449713b2c5/raw/fef22c729e01dd0777b378ac1e17e951ea47c7dd/git-log-to-json.php
php git-log-to-json.php > arango-commits-master-201503.json
The script may run a few minutes on bigger repositories such as ours. In the end, it should produce a JSON
file named arango-commits-master-201503.json.
I have also uploaded the JSON file here. Note that the
file only contains commits from the master branch and not all commits done in ArangoDB in total.
Importing the commits into ArangoDB
The simplest way to get the commits into ArangoDB is to use arangoimp:
This should have imported all the commits into a collection commits in the default database.
Querying the commit logs
Following are a few example queries that I ran on the data from the ArangoShell.
As mentioned before, it should be possible to run the queries for other repositories’ data.
getting all contributors
12
query='FOR commit IN commits COLLECT author = commit.author.name RETURN author';db._query(query).toArray();
retrieving the total number of commits
12
query='FOR commit IN commits COLLECT WITH COUNT INTO count RETURN count';db._query(query).toArray();
retrieving the number of commits by contributor
12
query='FOR commit IN commits COLLECT author = commit.author.name WITH COUNT INTO count RETURN { author: author, count: count }';db._query(query).toArray();
query='FOR commit IN commits COLLECT year = DATE_YEAR(commit.date) WITH COUNT INTO count RETURN { year: year, count: count }';db._query(query).toArray();
retrieving number of commits per month / year
12
query='FOR commit IN commits COLLECT year = DATE_YEAR(commit.date), month = DATE_MONTH(commit.date) WITH COUNT INTO count RETURN { month: CONCAT(year, "/", month), count: count }';db._query(query).toArray();
retrieving number of commits per weekday
12
query='FOR commit IN commits COLLECT day = DATE_DAYOFWEEK(commit.date) WITH COUNT INTO count RETURN { day: TRANSLATE(day, { "0": "Sunday", "1": "Monday", "2": "Tuesday", "3": "Wednesday", "4": "Thursday", "5": "Friday", "6": "Saturday" }), count: count }';db._query(query).toArray();
retrieving commits with string “issue #” in commit message