E_NO_SUCH_BLOG

Programming, philosophy, pedaling.


Keybase on the CI

Oct 26, 2017

Tags: ruby, programming, devblog

This is a post-mortem analysis of my work on getting the Keybase client running on Travis CI.

Hopefully it will help the half dozen other people running Keybase in a CI setting.

Background

I maintain an open source secret manager, KBSecret, that uses Keybase and the Keybase filesystem (KBFS) for encryption and synchronization.

KBSecret is written in Ruby and interacts with Keybase through a collection of unofficial libraries I wrote.

It exposes both a public API (in Ruby, of course) and a well-featured CLI via the kbsecret command. Both the API and CLI have unit tests to prevent regressions and provide a semi-rigorous specification of the overall program.

First problem: testing at all

Since KBSecret integrates tightly with Keybase and KBFS, its unit tests need to make several assumptions about the test host in order to do any meaningful testing:

As a developer testing on my own system, these are mostly reasonable assumptions to make. I can automate the suspension of my own (real) KBSecret state, and I certainly have Keybase and KBFS running.

However, I really like farming out the work of testing to a CI service — CIs make it impossible to forget to run your tests,1 and they give new contributors immediate and automated feedback on their changes. Auto-generated coverage reports are also nice, as they provide a metric for the project’s overall health.

First solution: stubbing and praying

My first approach to running the KBSecret tests on a CI was to abandon Keybase entirely, and attempt to stub a subset of Keybase functionality2. At a high level, the stubbing looked like this:

I completed my first version of this on August 1, and you can see the changes here.

Thus, there were two ways to run the unit tests:

# run tests as normal, assuming a real Keybase installation
$ rake test

# run tests with Keybase stubs
$ TEST_NO_KEYBASE=1 rake test

Both of these were placed under the make test target, to prevent me from accidentally introducing changes that broke one or the other while developing locally. Meanwhile, the .travis.yml file only contained the second invocation.

Voila:

First CI Tests (The 2.3 failure was a simple type error.)

Second problem: testing the CLI

The stubbing setup actually worked pretty well for a while, despite its hackiness. With the exception of a few changes caused by ongoing development, it required no maintenance or afterthought. It was also fast, with test and coverage results across two separate machines taking just over a minute on average to complete.

Everything was dandy…until I wanted to add the CLI tests to the CI. I quickly realized that continuing with the stubbing approach would require considerable effort — I would have to mock a great deal of Keybase and KBFS behavior (like user and team validation), and layer on even more require muckery to avoid throwing exceptions due to the process barrier between the keybase command and its subcommands. The result would be a half-functional Keybase mock that still wouldn’t cover the corners needed to test the CLI satisfactorily.

Second solution: Keybase on the CI

Given the problems with testing the CI above, I decided to throw my stubbing approach out entirely and try running the real Keybase client on the CI.

Can the Keybase client even run on a headless machine? Some quick searches confirmed that it could.

Installing the client turned out to be relatively easy:

# download the .deb from Keybase's servers
$ curl -O https://prerelease.keybase.io/keybase_amd64.deb

# install it directly
$ sudo dpkg -i keybase_amd64.deb

# ...and then fix all the broken dependencies it expects
$ sudo apt-get install -f

run_keybase then starts the Keybase service and KBFS daemon correctly, and we’re left with the task of automating the log-in process. This is where it gets tricky.

Fun with interactive automation

Keybase’s CLI is heavily interactive — most commands prompt the user for input, and assume that the user is on a TTY. ANSI colors and effects abound.

None of that is bad (it actually makes keybase very pleasant to use), but it poses a challenge when trying to automate things like keybase login and keybase deprovision (more on those below).

I spent a lot of time fiddling with different ways to do interactive automation, but I ended up going with good old expect and autoexpect for automatic generation:

# `kbsecretci` is the name of the KBSecret CI account on Keybase
$ autoexpect -c -f setup.expect keybase login kbsecretci
$ autoexpect -c -f teardown.expect keybase deprovision

This ended up working way better than I any right to expect (no pun intended). You can see the generated scripts (with some manual parameterization and fixups) here and here. Note the KBSECRETCI_PASSWORD environment variable — that contains the account’s password, and was configured directly in Travis.

Device hell

Keybase keeps track of the list of “provisioned” devices associated with an account. When a new device (like, say, a new CI instance) sends a log-in request, an existing device must be used to (interactively!) confirm the validity of the request and provision the new device. This is great for security, but awful for automation.

There are two exceptions to these requirements: the first device on a Keybase account, and provisioning via a “paperkey” device.

The first device exception is what I tried first: by provisioning the CI instance as the device and them deprovisioning it once the tests ended, I could functionally avoid the device confirmation step indefinitely. In order to prevent multiple CI test jobs from competing to become the “first” device (and thereby clobbering each other), I also had to configure Travis to only run one job at once.

This worked really well for a while:

Keybase on the CI, take 1

…and then broke fabulously:

Keybase on the CI, utterly broken

I still don’t know fully why this approach started failing (I have some guesses involving PGP keys and some persistent bad state), but it did so in myriad ways:

Keybase on the CI, failure 2

Keybase on the CI, failure 1

Keybase on the CI, failure 3

Paperkeys to the rescue

Since repeatedly provisioning and deprovisioning just one “first” device on the account wasn’t reliable, I switched to the other exception to the device confirmation rule: paperkeys.

Keybase paperkeys are a lot like normal cryptographic paperkeys, except that they’re human-readable (rather than just machine readable). They also function as devices, allowing a user to authenticate new devices by selecting their paperkey from the device list and typing it in. That means we can use one to provision our CI instances!

Keybase paperkey

Just as with the passkey method, we’ll keep the CI limited to one job at a time and still deprovision the device at the end of the run. This prevents the list of devices presented during keybase login from growing indefinitely, which in turn keeps the expect script for the paperkey method relatively simple:

set device_name [lindex $argv 0]
set paperkey $::env(KBSECRETCI_PAPERKEY)
set timeout -1
set send_slow {1 .1}
spawn keybase login kbsecretci
match_max 100000
expect -exact "\r
The device you are currently using needs to be provisioned.\r
Which one of your existing devices would you like to use\r
to provision this new device?\r
\r
    1. \[paper key\]    upgrade canal\r
\r
Choose a device: "
sleep .1
send -s -- "1\r"
expect -exact "1\r\r
Please enter a paper key for your account: "
sleep .1
send -s -- "${paperkey}\r"
sleep .1
expect -exact "\r
\r
\r
\[35m************************************************************\r
\[39m\[35m* Name your new device!                                    *\r
\[39m\[35m************************************************************\r
\[39m\r
\r
\r
Enter a public name for this device: "
sleep .1
send -s -- "${device_name}\r"
expect eof

We also (experientially) need to give keybase some time to start up, so the whole setup process for Keybase on Travis looks something like this:

set -ev

sudo apt-get -qq update

curl -O https://prerelease.keybase.io/keybase_amd64.deb

set +e
# this command will exit with 1, so don't let it take down the job with it
sudo dpkg -i keybase_amd64.deb
set -e

sudo apt-get install -f
sudo apt-get install expect

run_keybase

sleep 3

# the device name here is just the current timestamp, down to the milliseconds.
# this is sufficient, since the CI is configured to only run one process at a time,
# and devices are deprovisioned immediately after all tests complete.
device_name=$(date +%s%3N)

# NOTE: it's VERY IMPORTANT that no output from this command appear in public logs,
# since `keybase login` echoes the paperkey back to the terminal. If the paperkey gets leaked,
# anybody can fiddle with the CI account.
expect ./test/ci/setup.expect "${device_name}" > /dev/null 2>&1

sleep 3

(Note the redirection of expect’s output, since paperkeys get echoed by keybase login, unlike passphrases.)

Hacky, but now we have real Keybase working reliably on a headless CI!

CI Tests with Keybase installed

With both API and CLI tests:

API and CLI test results

Wrapup

This post was pretty scatterbrained, so I’ll just list everything you need to replicate my setup down here:

Thanks for reading!

- William

  1. Some people use commit hooks for this, but I’ve never really gotten them to do exactly what I want. 

  2. More precisely, functionality from my wrapper libraries.