How to measure contributions to a Git Repo

It's performance review season, which means that I need to quantify my contributions for the past year. Part of that is looking at my raw code contributions. According to Github, I've merged 63 PRs into the ray-project/ray repo in the last 12 months, but some of these were minor and some were major. How many lines of code did I actually touch?

To find out, I used the new composer-1 model to create a script which counts lines of code changed in a repo. It made a mistakes, but its speed was impressive and the test-debug cycles were fast:

#!/usr/bin/env bash
#
# commit_stats.sh
#
# Compute per-author line change statistics over the past year and
# write them into commit_stats.tsv as a TSV file with a header row:
#
#   author	additions	deletions	net
#
# "additions" and "deletions" are summed over all commits in the period.
# "net" = additions - deletions.
#
# Usage:
#   ./commit_stats.sh
#
# Requirements:
#   - Run inside a Git repository
#   - git, awk, sort available on PATH

set -euo pipefail

{
  # 1) Use git log to emit author + numstat info.
  #    --since="1 year ago" : only include commits from the last year
  #    --no-merges          : ignore merge commits
  #    --numstat            : per-file added/deleted counts
  #    --format='%aN'       : print the author name before each commit's numstat
  git log --since="1 year ago" --no-merges \
    --numstat --format='%aN' | \
  # 2) Use awk to aggregate additions/deletions per author.
  #    Input stream looks like:
  #      Alice Smith
  #      10  2  file1.py
  #      3   0  file2.txt
  #      Bob Jones
  #      5   1  file3.go
  awk '
    BEGIN {
      FS = "\t"    # Input field separator: git numstat uses tabs
      OFS = "\t"   # Output field separator: TSV format
    }

    # Skip binary entries: lines with 3 fields where first two are "-".
    NF == 3 && $1 == "-" && $2 == "-" {
      next
    }

    # Lines with 3 fields where first two are numbers are numstat lines:
    # "<added> <deleted> <filename>".
    NF == 3 && $1 ~ /^[0-9]+$/ && $2 ~ /^[0-9]+$/ {
      add[author] += $1
      del[author] += $2
      next
    }

    # Any other non-empty line is an author name (can have any number of fields).
    NF > 0 {
      author = $0
      next
    }

    # At the end, print one line per author (no header here).
    END {
      for (a in add) {
        net = add[a] - del[a]
        print a, add[a], del[a], net
      }
    }
  ' | \
  # 3) Sort the data rows by net contributions (column 4) in descending numeric order.
  # Using tab character as delimiter
  sort -t"	" -k4,4 -nr
} | {
  # Prepend the TSV header before the sorted data
  echo -e "author\tadditions\tdeletions\tnet"
  cat
} > commit_stats.tsv

echo "Wrote commit_stats.tsv (per-author line changes for the past year, sorted by net contributions)."

My results:

Lines added: 12900

Lines subtracted: 10255

Net new lines: 2645

...which is slightly more than my boss, so we're good! πŸ˜…

Copyright Ricardo Decal. ricardodecal.com