How to measure contributions to a Git Repo
It's performance review season, which means that I need to quantify my contributions for the past year. Part of that is looking at my raw code contributions. According to Github, I've merged 63 PRs into the ray-project/ray repo in the last 12 months, but some of these were minor and some were major. How many lines of code did I actually touch?
To find out, I used the new composer-1 model to create a script which counts lines of code changed in a repo. It made a mistakes, but its speed was impressive and the test-debug cycles were fast:
#!/usr/bin/env bash
#
# commit_stats.sh
#
# Compute per-author line change statistics over the past year and
# write them into commit_stats.tsv as a TSV file with a header row:
#
# author additions deletions net
#
# "additions" and "deletions" are summed over all commits in the period.
# "net" = additions - deletions.
#
# Usage:
# ./commit_stats.sh
#
# Requirements:
# - Run inside a Git repository
# - git, awk, sort available on PATH
set -euo pipefail
{
# 1) Use git log to emit author + numstat info.
# --since="1 year ago" : only include commits from the last year
# --no-merges : ignore merge commits
# --numstat : per-file added/deleted counts
# --format='%aN' : print the author name before each commit's numstat
git log --since="1 year ago" --no-merges \
--numstat --format='%aN' | \
# 2) Use awk to aggregate additions/deletions per author.
# Input stream looks like:
# Alice Smith
# 10 2 file1.py
# 3 0 file2.txt
# Bob Jones
# 5 1 file3.go
awk '
BEGIN {
FS = "\t" # Input field separator: git numstat uses tabs
OFS = "\t" # Output field separator: TSV format
}
# Skip binary entries: lines with 3 fields where first two are "-".
NF == 3 && $1 == "-" && $2 == "-" {
next
}
# Lines with 3 fields where first two are numbers are numstat lines:
# "<added> <deleted> <filename>".
NF == 3 && $1 ~ /^[0-9]+$/ && $2 ~ /^[0-9]+$/ {
add[author] += $1
del[author] += $2
next
}
# Any other non-empty line is an author name (can have any number of fields).
NF > 0 {
author = $0
next
}
# At the end, print one line per author (no header here).
END {
for (a in add) {
net = add[a] - del[a]
print a, add[a], del[a], net
}
}
' | \
# 3) Sort the data rows by net contributions (column 4) in descending numeric order.
# Using tab character as delimiter
sort -t" " -k4,4 -nr
} | {
# Prepend the TSV header before the sorted data
echo -e "author\tadditions\tdeletions\tnet"
cat
} > commit_stats.tsv
echo "Wrote commit_stats.tsv (per-author line changes for the past year, sorted by net contributions)."
My results:
Lines added: 12900
Lines subtracted: 10255
Net new lines: 2645
...which is slightly more than my boss, so we're good! π