GitHub House-Keeping

Hubert Behaghel

behaghel@gmail.com

Take-away

GitHub ❤ us, we shall ❤ it back.

Frugality matters.

Reverting the entropy for fun and profit.

README on GitHub are fabulous.

Sign your work!

The issue

We were maxed out on private repositories on GitHub. (Platinum plan: 200$/month, 125 private repo)

The wrong thing to do

Just pay more and move on.

Guess what we did?

  • We paid more and moved on.
  • We opted for the Diamond plan: 450$/month 300 private repo.
  • My point today: let's pretend we haven't moved on.

Why?

  • This is an interesting problem, don't miss it!
  • It's about scalability. We engineers care about that.
  • It's about frugality. We engineers like efficiency.

How does it scale?

Track the creation rate over time.

bskyb-repo-activity.lst

github-creation.ts.png

So what?

  • Soon enough managing our repositories manually will be impossible.
  • The time has come to make sure the way we use GitHub scales well.

Challenges I see

  • sharing and reusing should be as easy as possible.
    • E.g. one should be able to say what a repository is about very quickly.
  • search should remain useful.
  • ownership should be clear.
  • dead repositories should be deleted.

Search

Identify quickly what a repository is about

  • GitHub README are great and powerful, let's use that!
  • Tell us about why you created this repo and why it should matters to the rest of us (or not at all).
  • If it could be of any use to the rest of us, tell us how we should use it.

Ownership

41BKx1AxQWL.jpg

Being a pragmatic programmer

Tip #70 (last): "Sign your work".

Where to sign

  • The README
  • The commit author.
    • I would strongly advocate against a team level authoring with your commits.

A proposal for a minimal README template

## Motivation

## Maintainers

## Contributing (optional)

How to track dead repositories

Collecting Data

require 'octokit'
require 'date'

class RepoActivity
  def initialize
    @client = Octokit::Client.new(auto_paginate: true,
                                  access_token: ENV['MY_GITHUB_TOKEN'])
  end

  def liveliness(repo)
    stats = @client.commit_activity_stats(repo.full_name)
    cached = @client.last_response.status == 200
    if stats && cached
      stats.reduce(0) { |a, e| a + e[:total] }
    else -1
    end
  end

  def print_activity
    repos = @client.org_repos('sky-uk', type: 'private')
    h = {}
    # loop_while -> { h.has_value?(-1) } do
    h = Hash[repos.map { |r| [r, liveliness(r)] }]
    h.each do |k, v| # retry once (github cache warmed-up)
      h[k] = liveliness(k) if v == -1
    end
    # end
    h.each do |k, v|
      puts format('%s %s %-32s %d',
                  k.created_at.to_date.iso8601, k.pushed_at.to_date.iso8601,
                  k.name, v)
    end
  end

end

RepoActivity.new.print_activity

Classifying Data

BEGIN {
    cmd="date +%s"
    cmd | getline today
    close(cmd)
}
{
    cmd="date -jf \"%Y-%m-%d\" \""$2"\" +%s"
    cmd | getline d
    close(cmd)
    age= (today - d) / 3600 / 24
    if ($4 == 0)
        print $3 > "not-pushed-for-1y.lst"
    else if (age > 182 && $4 < 50)
        print $3 > "old-and-low-activity.lst"
    else
        printf "%-32s\t%4d\t%5d\n", $3, -age, $4
}

Visualising The Data

Sorry, your browser does not support SVG.

(right click > open image to make it interactive)

A proposal to clean-up

github-cleanup.png

Thanks!

The making of this presentation is potentially more interesting than the result, if you are interested, I can show it to you.