<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://blog.yossarian.net/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.yossarian.net/" rel="alternate" type="text/html" /><updated>2024-01-23T15:29:55+00:00</updated><id>https://blog.yossarian.net/feed.xml</id><title type="html">ENOSUCHBLOG</title><subtitle>Programming, philosophy, pedaling.</subtitle><author><name>William Woodruff</name></author><entry><title type="html">A new release of ff2mpv</title><link href="https://blog.yossarian.net/2024/01/23/A-new-release-of-ff2mpv" rel="alternate" type="text/html" title="A new release of ff2mpv" /><published>2024-01-23T00:00:00+00:00</published><updated>2024-01-23T00:00:00+00:00</updated><id>https://blog.yossarian.net/2024/01/23/A-new-release-of-ff2mpv</id><content type="html" xml:base="https://blog.yossarian.net/2024/01/23/A-new-release-of-ff2mpv"><![CDATA[<p>This is a short announcement post for the 5.x series of
<a href="https://github.com/woodruffw/ff2mpv">ff2mpv</a>.</p>

<p>The last release (the 4.x series) was
<a href="https://github.com/woodruffw/ff2mpv/releases/tag/v4.0.0">exactly 2 years ago</a>,
so it’s been a little while.</p>

<h2 id="big-new-things">Big new things</h2>

<p>This release series has two primary features:</p>

<ul>
  <li>
    <p>ff2mpv now supports configurable profiles, which can be created and
modified directly from the extension. The <a href="https://github.com/woodruffw/ff2mpv/pull/108">associated pull request</a> has
all of the relevant details, but the relevant bit is this
(under <code class="language-plaintext highlighter-rouge">about:addons &gt; ff2mpv &gt; Preferences</code>):</p>

    <p><img src="/assets/ff2mpv-profiles.png" alt="" /></p>

    <p>As the big scary message indicates, you <strong>must</strong> be careful about
what you put in your MPV profiles: these are arbitrary flags that
are passed to the underlying <code class="language-plaintext highlighter-rouge">mpv</code> invocation, and MPV has ample
ways to execute arbitrary code. Caveat emptor.</p>

    <p>Many thanks to <a href="https://github.com/DanSM-5 ">@DanSM-5 </a> for his development efforts on this feature.</p>
  </li>
  <li>
    <p>External extensions <a href="https://github.com/woodruffw/ff2mpv/pull/113">can now invoke ff2mpv</a> via the
<code class="language-plaintext highlighter-rouge">browser.runtime.onMessageExternal</code> API, by user demand. The change
itself was small (just 23 additional lines), but will apparently
make ff2mpv <a href="https://github.com/woodruffw/ff2mpv/issues/112">compose with other extensions</a>.</p>

    <p>Thanks again to <a href="https://github.com/DanSM-5 ">@DanSM-5 </a> for this feature enhancement as well.</p>
  </li>
</ul>

<h2 id="smaller-new-things">Smaller new things</h2>

<ul>
  <li>
    <p>Outside of the extension itself, support for other browsers (beyond
Firefox and mainline Chrome/Chromium) has continued to improve. Or
at least I think it has, because people only sporadically file bugs
for Brave, LibreWolf, &amp;c. Some relevant changes over the last two years:</p>

    <ul>
      <li><a href="https://github.com/woodruffw/ff2mpv/pull/72">Support for more browsers in the installation scripts</a> by <a href="https://github.com/eNV25 ">@eNV25 </a></li>
      <li><a href="https://github.com/woodruffw/ff2mpv/pull/77">An install script for Windows</a> by <a href="https://github.com/DanSM-5 ">@DanSM-5 </a></li>
      <li><a href="https://github.com/ryze312/ff2mpv-rust">A new Rust native host implementation</a> by <a href="https://github.com/ryze312 ">@ryze312 </a></li>
      <li><a href="https://github.com/woodruffw/ff2mpv/pull/102">Support for Brave in the Windows installation script</a> by <a href="https://github.com/DanSM-5 ">@DanSM-5 </a></li>
      <li><a href="https://github.com/woodruffw/ff2mpv/pull/111">A fix for installations on arm64 macOS</a> by <a href="https://github.com/claudiofreitas ">@claudiofreitas </a></li>
      <li>…and a whole bunch of miscellaneous documentation fixes and improvements
by various contributors. Thank you all!</li>
    </ul>
  </li>
</ul>

<h2 id="upcoming-things">Upcoming things</h2>

<p>ff2mpv continues to be <em>very</em> conservatively developed: support for MPV profiles
is the biggest feature addition in years, and is likely to remain the biggest
for a long time.</p>

<p>The biggest upcoming thing is the move to Manifest V3, which
<a href="https://developer.chrome.com/docs/extensions/develop/migrate/checklist">Chrome is making us do</a>. There’s a <a href="https://github.com/woodruffw/ff2mpv/pull/70">work in progress PR</a> for it,
but we still have a few months until anything <em>needs</em> to change
(provided Chrome doesn’t push the migration back again).</p>

<p>When MV3 comes out, I will likely begin the 6.x series. Until then, enjoy
the 5.x series and please report any bugs you run into!</p>]]></content><author><name>William Woodruff</name></author><category term="devblog" /><category term="programming" /><summary type="html"><![CDATA[This is a short announcement post for the 5.x series of ff2mpv.]]></summary></entry><entry><title type="html">You don’t need analytics on your blog</title><link href="https://blog.yossarian.net/2023/12/24/You-dont-need-analytics-on-your-blog" rel="alternate" type="text/html" title="You don’t need analytics on your blog" /><published>2023-12-24T00:00:00+00:00</published><updated>2023-12-24T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/12/24/You-dont-need-analytics-on-your-blog</id><content type="html" xml:base="https://blog.yossarian.net/2023/12/24/You-dont-need-analytics-on-your-blog"><![CDATA[<p>From the “blog post ideas for when I have no other ideas” file.</p>

<p>I think most (personal) blogs don’t need browser analytics<sup id="fnref:browser" role="doc-noteref"><a href="#fn:browser" class="footnote" rel="footnote">1</a></sup>.
This blog has no browser analytics; I removed them in
<a href="/2017/01/22/New-Domain-and-More">back in 2017</a>,
with no discernible negative impact on my writing motivation, my ability
to observe general traffic trends, or the blog’s popularity.</p>

<p>TL;DR for everything below: the only way I track traffic trends on this
blog is with <a href="https://yossarian.net/snippets#vbnla"><code class="language-plaintext highlighter-rouge">vbnla</code></a>, a hacky little
Ruby script that summarizes my Nginx access log. This has been more than
sufficient for assessing post popularity and the general geographic and link
origins for incoming traffic.</p>

<h2 id="writing-motivation">Writing motivation</h2>

<p>I originally added analytics (in the form of Google Analytics) to the blog
<a href="/2015/04/26/Blog-Changes-RSS-Analytics">back in 2015</a>, when
I was still in college.</p>

<p>My original rationale for adding analytics was to to motivate myself to
write posts on a regular schedule: I thought that having a pretty dashboard
of (ever increasing, in my natural human desire for attention) visitors would
induce additional writing from me each month.</p>

<p>The opposite turned out to be true: like most blogs, nobody really read mine,
and my visibility into that <em>demotivated</em> me and made me feel like giving up
on a blog entirely. Looking back, this was inevitable, but having analytics
<em>made it worse</em>: instead of only evaluating my own desire to write and improve
my blogging, I became (resentfully) dependent on a small trickle of statistics.</p>

<p>Ultimately, there’s nothing wrong with deriving some motivation from the
attention that can come with blogging (I still do, as noted below).
But it’s also the cheapest source of motivation, in the senses of being the
easiest to track, least reliable, and ultimately not tied to a passion for the
act of writing <em>itself</em>. Turning off the blog’s browser analytics helped
me decrease the overall degree of motivation that came from external attention,
while increasing the overall degree I felt from both habituation (writing at
least one post a month) and post quality (feeling that I was actually
expressing something interesting to myself, rather than attempting to track
readers’ interests).</p>

<h2 id="analytics-when-i-need-them">Analytics when I need them</h2>

<p>Like any good blog post title, there is a kernel of untruth in this one:
analytics <em>are</em> useful for debugging purposes<sup id="fnref:debug" role="doc-noteref"><a href="#fn:debug" class="footnote" rel="footnote">2</a></sup>, tracking overall
server load<sup id="fnref:load" role="doc-noteref"><a href="#fn:load" class="footnote" rel="footnote">3</a></sup> and, more selfishly, understanding when a particular
post is getting attention.</p>

<p>I still have a little bit of insight into my blog’s traffic for these things,
via my web server’s <a href="https://en.wikipedia.org/wiki/Common_Log_Format">access log</a>. Each log entry looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>XXX.YYY.AAA.BBB - - [24/Dec/2023:00:10:20 -0500] "GET /2022/12/28/ReDoS-vulnerabilities-and-misaligned-incentives HTTP/1.1" 200 9654 "https://some-referrer.example.com" "Mozilla/5.0"
</pre></td></tr></tbody></table></code></pre></div></div>

<p>(where <code class="language-plaintext highlighter-rouge">XXX.YYY.AAA.BBB</code> is the IP address that made the request).</p>

<p>This tells me just about everything I <em>could</em> need to know for the aforementioned
debugging, load monitoring, and selfish attention purposes. It also gives
me essentially the same things that I <em>would</em> want from browser analytics,
without injecting third-party code into each reader’s browser:</p>

<ul>
  <li>IP addresses give me rough geolocation information, for understanding where
readers are coming from (or sources of significant automated traffic, like
RSS pollers and search crawlers);</li>
  <li>User agents and the HTTP <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer">Referer</a> give me rough readership demographics;</li>
  <li>The accessed URL tells me what’s actually being read.</li>
</ul>

<p>In practice, I rarely use this information. But it <em>does</em> exist,
and it gives me everything that I <em>would have</em> wanted from browser analytics.</p>

<p>When I do use it, I use it through <a href="https://yossarian.net/snippets#vbnla"><code class="language-plaintext highlighter-rouge">vbnla</code></a>, a hacky little script that
collates the log’s recorded requests by top IPs, user agents, requests,
and referrers. Here’s what that looks like on today’s log, as of
this morning (with IPs redacted):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
</pre></td><td class="rouge-code"><pre>Total requests: 5195
Unique IPs: 878
Top 10 IPs:
	REDACTED -&gt; 160
	REDACTED -&gt; 92
	REDACTED -&gt; 72
	REDACTED -&gt; 71
	REDACTED -&gt; 63
	REDACTED -&gt; 52
	REDACTED -&gt; 51
	REDACTED -&gt; 50
	REDACTED -&gt; 49
	REDACTED -&gt; 47

Top 10 UAs:
	NetNewsWire (RSS Reader; https://netnewswire.com/) -&gt; 241
	FreshRSS/1.22.1 (Linux; https://freshrss.org) -&gt; 223
	- -&gt; 200
	Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 -&gt; 159
	Yarr/1.0 -&gt; 121
	Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0 -&gt; 96
	Readdigbot (+https://www.readdig.com) -&gt; 92
	FreshRSS/1.21.0 (Linux; https://freshrss.org) -&gt; 90
	Bifolia:0.1.1 -&gt; 71
	Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0 -&gt; 70

Top 10 requests:
	/.env -&gt; 22
	/2020/12/24/A-few-HiDPI-tricks-for-Linux -&gt; 20
	/2020/09/19/LLVMs-getelementptr-by-example -&gt; 12
	/2021/07/19/LLVM-internals-part-1-bitcode-format -&gt; 11
	/2022/02/02/Setting-up-Navidrome-with-Nginx-as-a-reverse-proxy -&gt; 8
	/2023/10/18/Some-concerns-with-OpenPubKey -&gt; 8
	/login.php?s=Admin/login -&gt; 8
	/2023/09/22/GitHub-Actions-could-be-so-much-better -&gt; 8
	/2020/11/30/How-many-registers-does-an-x86-64-cpu-have -&gt; 7
	/2020/12/16/Static-calls-in-Linux-5-10 -&gt; 6

Top 10 referrers:
	- -&gt; 4522
	https://blog.yossarian.net/feed.xml -&gt; 443
	https://blog.yossarian.net/ -&gt; 59
	https://www.google.com/ -&gt; 22
	https://blog-yossarian-net.translate.goog/ -&gt; 16
	http://blog.yossarian.net/feed.xml -&gt; 12
	https://yandex.ru/ -&gt; 7
	https://duckduckgo.com/ -&gt; 6
	http://yossarian.net -&gt; 5
	https://yossarian.net/docs/libmsr/ -&gt; 3
</pre></td></tr></tbody></table></code></pre></div></div>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:browser" role="doc-endnote">
      <p>Better know as “client-side analytics” or similar. I didn’t know what to call these; thanks to <a href="https://news.ycombinator.com/item?id=38755247">‘michaelsalem</a> for pointing this out. <a href="#fnref:browser" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:debug" role="doc-endnote">
      <p>For example, diagnosing reader reports of IPv6 not working, HTTPS not working, rendering problems on some browsers, RSS not working, etc. <a href="#fnref:debug" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:load" role="doc-endnote">
      <p>This blog has never overwhelmed the pathetically small VPS that it runs on, but it has sometimes come close. When I see that a post is getting attention, I measure the overall volume of traffic to get a sense of if/when I will ever need to upgrade to a larger VPS. <a href="#fnref:load" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="meta" /><summary type="html"><![CDATA[From the “blog post ideas for when I have no other ideas” file.]]></summary></entry><entry><title type="html">Function interposition in Rust with upgrayedd</title><link href="https://blog.yossarian.net/2023/11/19/Function-interposition-in-Rust-with-upgrayedd" rel="alternate" type="text/html" title="Function interposition in Rust with upgrayedd" /><published>2023-11-19T00:00:00+00:00</published><updated>2023-11-19T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/11/19/Function-interposition-in-Rust-with-upgrayedd</id><content type="html" xml:base="https://blog.yossarian.net/2023/11/19/Function-interposition-in-Rust-with-upgrayedd"><![CDATA[<p>Yet another announcement-type post, this time for a small Rust library I hacked
up while trying to deduplicate some boilerplate in another project:
<a href="https://github.com/woodruffw/upgrayedd"><code class="language-plaintext highlighter-rouge">upgrayedd</code></a>.</p>

<p>This is what using <code class="language-plaintext highlighter-rouge">upgrayedd</code> looks like:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="k">use</span> <span class="nn">upgrayedd</span><span class="p">::</span><span class="n">upgrayedd</span><span class="p">;</span>

<span class="nd">#[upgrayedd]</span>
<span class="k">fn</span> <span class="nf">X509_VERIFY_PARAM_set_auth_level</span><span class="p">(</span><span class="n">param</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_void</span><span class="p">,</span> <span class="n">level</span><span class="p">:</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span><span class="p">)</span> <span class="p">{</span>
    <span class="nd">eprintln!</span><span class="p">(</span><span class="s">"before!"</span><span class="p">);</span>
    <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">upgrayedd</span><span class="p">(</span><span class="n">param</span><span class="p">,</span> <span class="n">level</span><span class="p">)</span> <span class="p">};</span>
    <span class="nd">eprintln!</span><span class="p">(</span><span class="s">"after!"</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>If you build that in a crate with <code class="language-plaintext highlighter-rouge">crate-type = ["cdylib"]</code>, then you can
do this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre><span class="c"># libfunkycrypto is whatever your crate's shared object build target is</span>
<span class="nv">LD_PRELOAD</span><span class="o">=</span>./libfunkycrypto.so <span class="se">\</span>
    curl <span class="nt">--silent</span> https://example.com <span class="nt">--ciphers</span> DEFAULT@SECLEVEL<span class="o">=</span>2
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…to run custom code before and after each call to OpenSSL’s
<a href="https://www.openssl.org/docs/manmaster/man3/X509_VERIFY_PARAM_set_auth_level.html"><code class="language-plaintext highlighter-rouge">X509_VERIFY_PARAM_set_auth_level</code></a>.</p>

<p>The rest of this post will be a brief introduction to (dynamic) function
interposition and how it works, <code class="language-plaintext highlighter-rouge">upgrayedd</code>’s implementation details,
and what it can (and can’t) be used for.</p>

<h2 id="dynamic-function-interposition">(Dynamic) function interposition</h2>

<p><a href="https://jayconrod.com/posts/23/tutorial-function-interposition-in-linux">Function interposition</a> is a basic program instrumentation technique: to
measure, detect, or modify the use of a function in a program, we
<em>replace</em> calls to that function with calls to a function that we
(the instrumenter) control. The interposed (“wrapper”) function
can then monitor (and rewrite) the program’s state, including:</p>

<ul>
  <li>Changing the parameters passed into the target (“wrapped”) function,
causing it to behave differently;</li>
  <li>Changing the return value from the target, causing the caller to behave
differently;</li>
  <li>Skipping the target entirely (stubbing or emulating it);</li>
  <li>Modifying the program’s larger state, including globals, handles on system
resources, &amp;c.</li>
</ul>

<p>There are many different ways to interpose on a program’s functions, but one
of the simplest is the <a href="/2017/05/05/Cleaner-Function-Wrapping-in-C"><code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> trick</a>:<sup id="fnref:ldpreload" role="doc-noteref"><a href="#fn:ldpreload" class="footnote" rel="footnote">1</a></sup> when set, the dynamic
linker/loader will give precedence to the symbols defined in the specified
shared object.</p>

<p>The wrapper can then access the underlying replaced function
via <code class="language-plaintext highlighter-rouge">real_func = dlsym(RTLD_NEXT, ...)</code>, where <code class="language-plaintext highlighter-rouge">RTLD_NEXT</code> is a special
pseudo-handle that tells <a href="https://man7.org/linux/man-pages/man3/dlsym.3.html"><code class="language-plaintext highlighter-rouge">dlsym(3)</code></a> to retrieve the next occurrence
of the symbol in the dynamic linker’s search order.</p>

<p>In practice, this means that <em>any</em> <em>dynamic</em><sup id="fnref:dynamic" role="doc-noteref"><a href="#fn:dynamic" class="footnote" rel="footnote">2</a></sup> function call can be interposed
with <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code>, including “basic” routines like <code class="language-plaintext highlighter-rouge">malloc</code><sup id="fnref:malloc" role="doc-noteref"><a href="#fn:malloc" class="footnote" rel="footnote">3</a></sup>.</p>

<h2 id="how-upgrayedd-works">How <code class="language-plaintext highlighter-rouge">upgrayedd</code> works</h2>

<p>As hinted above: <code class="language-plaintext highlighter-rouge">upgrayedd</code> works through <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code>, which isn’t <em>that</em>
special.</p>

<p>What <em>does</em> make <code class="language-plaintext highlighter-rouge">upgrayedd</code> special is its ability to abstract
much of the (error-prone) boilerplate that comes with writing function
instrumentation and interposition code.</p>

<p>Compare, for example, the following <code class="language-plaintext highlighter-rouge">upgrayedd</code> wrapper on <a href="https://linux.die.net/man/3/rand"><code class="language-plaintext highlighter-rouge">rand(3)</code></a>:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="k">use</span> <span class="nn">upgrayedd</span><span class="p">::</span><span class="n">upgrayedd</span><span class="p">;</span>

<span class="nd">#[upgrayedd(real_rand)]</span>
<span class="k">fn</span> <span class="nf">rand</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="c1">// The domain of [0, 42) is random enough.</span>
    <span class="p">(</span><span class="k">unsafe</span> <span class="p">{</span> <span class="nf">real_rand</span><span class="p">()</span> <span class="p">})</span> <span class="o">%</span> <span class="mi">42</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…with its rough equivalent in C:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre><span class="cp">#include</span> <span class="cpf">&lt;stddef.h&gt;</span><span class="cp">
</span>
<span class="cp">#define _GNU_SOURCE  // for RTLD_NEXT
#include</span> <span class="cpf">&lt;dlfcn.h&gt;</span><span class="cp">
</span>
<span class="k">static</span> <span class="kt">int</span> <span class="p">(</span><span class="o">*</span><span class="n">_real_rand</span><span class="p">)()</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="kt">int</span> <span class="nf">rand</span><span class="p">()</span> <span class="p">{</span>
  <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">_real_rand</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">_real_rand</span> <span class="o">=</span> <span class="n">dlsym</span><span class="p">(</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="s">"rand"</span><span class="p">);</span>
  <span class="p">}</span>

  <span class="c1">// The domain of [0, 42) is random enough.</span>
  <span class="k">return</span> <span class="n">_real_rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">42</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The C version is slightly more verbose and significantly more error prone:
the function signature is written twice (once for
the wrapper, and once for the function pointer containing the target),
and the C compiler won’t catch any errors in either.</p>

<p>A naive Rust implementation would have similar problems; <code class="language-plaintext highlighter-rouge">upgrayedd</code> mostly
sidesteps these by abstracting away each individual step behind its
<a href="https://doc.rust-lang.org/beta/reference/procedural-macros.html">procedural macro</a>.</p>

<p>When expanded, the <code class="language-plaintext highlighter-rouge">rand</code> example above looks something like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
</pre></td><td class="rouge-code"><pre><span class="k">static</span> <span class="k">mut</span> <span class="n">__upgrayedd_target_rand</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span><span class="o">&gt;</span> <span class="o">=</span> <span class="nb">None</span><span class="p">;</span>

<span class="nd">#[no_mangle]</span>
<span class="nd">#[doc(hidden)]</span>
<span class="nd">#[allow(non_snake_case)]</span>
<span class="nd">#[export_name</span> <span class="nd">=</span> <span class="s">"rand"</span><span class="nd">]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">__upgrayedd_inner_wrapper_rand</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="k">if</span> <span class="n">__upgrayedd_target_rand</span><span class="nf">.is_none</span><span class="p">()</span> <span class="p">{</span>
        <span class="n">__upgrayedd_target_rand</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">transmute</span><span class="p">(</span>
            <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nf">dlsym</span><span class="p">(::</span><span class="nn">libc</span><span class="p">::</span><span class="n">RTLD_NEXT</span><span class="p">,</span> <span class="nn">std</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nf">transmute</span><span class="p">(</span><span class="s">b"rand</span><span class="se">\x00</span><span class="s">"</span><span class="nf">.as_ptr</span><span class="p">())),</span>
        <span class="p">);</span>
    <span class="p">}</span>
    <span class="k">if</span> <span class="n">__upgrayedd_target_rand</span><span class="nf">.is_none</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">let</span> <span class="n">msg</span> <span class="o">=</span> <span class="s">b"barf: upgrayedd tried to hook something that broke rust's runtime: "</span><span class="p">;</span>
        <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nf">write</span><span class="p">(</span>
            <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="n">STDERR_FILENO</span><span class="p">,</span>
            <span class="n">msg</span><span class="nf">.as_ptr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="p">,</span>
            <span class="n">msg</span><span class="nf">.len</span><span class="p">(),</span>
        <span class="p">);</span>
        <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nf">write</span><span class="p">(</span>
            <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="n">STDERR_FILENO</span><span class="p">,</span>
            <span class="s">b"rand"</span><span class="nf">.as_ptr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="p">,</span>
            <span class="s">b"rand"</span><span class="nf">.len</span><span class="p">(),</span>
        <span class="p">);</span>
        <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nf">write</span><span class="p">(::</span><span class="nn">libc</span><span class="p">::</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="s">b"</span><span class="se">\n</span><span class="s">"</span><span class="nf">.as_ptr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="p">::</span><span class="nn">libc</span><span class="p">::</span><span class="nb">c_void</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
        <span class="nn">std</span><span class="p">::</span><span class="nn">process</span><span class="p">::</span><span class="nf">abort</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="nf">rand</span><span class="p">()</span>
<span class="p">}</span>

<span class="nd">#[allow(non_snake_case)]</span>
<span class="k">fn</span> <span class="nf">rand</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ffi</span><span class="p">::</span><span class="nb">c_int</span> <span class="p">{</span>
    <span class="nd">#[allow(unused_variables)]</span>
    <span class="k">let</span> <span class="n">real_rand</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="n">__upgrayedd_target_rand</span><span class="nf">.unwrap_unchecked</span><span class="p">()</span> <span class="p">};</span>
    <span class="p">(</span><span class="k">unsafe</span> <span class="p">{</span> <span class="nf">real_rand</span><span class="p">()</span> <span class="p">})</span> <span class="o">%</span> <span class="mi">42</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…which is a mess to read, but is <em>essentially</em> the same thing as the C
version, but with a few extra checks (most notably, a hard abort if
<code class="language-plaintext highlighter-rouge">dlsym</code> fails to retrieve the target function<sup id="fnref:never" role="doc-noteref"><a href="#fn:never" class="footnote" rel="footnote">4</a></sup>).</p>

<h2 id="limitations">Limitations</h2>

<p><code class="language-plaintext highlighter-rouge">upgrayedd</code> comes with a few caveats beyond the normal limitations of
<code class="language-plaintext highlighter-rouge">LD_PRELOAD</code>:</p>

<ul>
  <li>
    <p>For the time being, it uses Rust’s <code class="language-plaintext highlighter-rouge">std</code> (and more generally, users
may somewhat reasonably expect to be able to use crates or logic that
needs <code class="language-plaintext highlighter-rouge">std</code> in their interpositioned code). As a result, <code class="language-plaintext highlighter-rouge">upgrayedd</code>
may not always hook where you expect <em>if</em> mingled inside of a larger
Rust codebase.</p>

    <p>Bottom line: use it in small codebases, with minimal
dependencies<sup id="fnref:deps" role="doc-noteref"><a href="#fn:deps" class="footnote" rel="footnote">5</a></sup>.</p>
  </li>
  <li>
    <p>Despite being written in Rust (tm) and exposing a “safe” wrapper
(until you choose to call the underlying target), <code class="language-plaintext highlighter-rouge">upgrayedd</code> is
<em>fundamentally unsafe</em> and <em>cannot be made safe</em>: interposition fundamentally
involves unsafe  castings of pointers behind symbols, with no guarantee that
the signature is correct (or that the symbol is even a function<sup id="fnref:symbol" role="doc-noteref"><a href="#fn:symbol" class="footnote" rel="footnote">6</a></sup>).</p>

    <p>Even just <em>declaring</em> an <code class="language-plaintext highlighter-rouge">upgrayedd</code> hook can cause (silent!) memory corruption
if the type or signature behind the symbol is not the expected one. You
<strong>should not</strong> use it in anything that needs to be stable or secure; its
primary value is in writing instrumentation tooling that stays on a
developer’s workbench.</p>
  </li>
  <li>
    <p>It’s Linux-only for the moment, for the simple reason of “I haven’t bothered
to make it work anywhere else.” It would probably work well on the BSDs
with minimal changes, and should work on macOS with SIP disabled;
I don’t know if Windows supports this kind of thing<sup id="fnref:dllinjection" role="doc-noteref"><a href="#fn:dllinjection" class="footnote" rel="footnote">7</a></sup>.</p>
  </li>
</ul>

<h2 id="wrapup">Wrapup</h2>

<p>Ultimately, <code class="language-plaintext highlighter-rouge">upgrayedd</code> is a bikeshed/waypoint towards a larger tool that I’m
working on for detecting API misuse. Stay tuned for more news on that.</p>

<p>Despite writing Rust for ~5 years now, this was my first time <em>creating</em>
(rather than just modifying) a <code class="language-plaintext highlighter-rouge">proc-macro</code> crate.</p>

<p>Developing a procedural macro was an interesting experience: being able
to control the generated syntax was both very freeing and <em>also</em> very
constraining (the complexity of Rust’s AST is on full display in <code class="language-plaintext highlighter-rouge">syn</code>’s APIs,
which makes even “trivial” looking transformations<sup id="fnref:xform" role="doc-noteref"><a href="#fn:xform" class="footnote" rel="footnote">8</a></sup> somewhat complicated).
I once again found myself wishing for a <a href="https://crystal-lang.org/reference/1.10/syntax_and_semantics/macros/index.html">macro system like Crystal’s</a>,
where a small amount of flexibility is exchanged for substantially simpler
“template” style transformations.</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:ldpreload" role="doc-endnote">
      <p>On Linux and (most?) BSDs. The same technique also works macOS via <code class="language-plaintext highlighter-rouge">DYLD_LIBRARY_PATH</code>, although not with <a href="https://en.wikipedia.org/wiki/System_Integrity_Protection">SIP</a> enabled. <a href="#fnref:ldpreload" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:dynamic" role="doc-endnote">
      <p>Emphasis on <em>dynamic</em>: this specific technique doesn’t work on static calls, e.g. routines compiled directly into the binary through static linkage. Static function interposition is outside of the scope of this post, but can be done with techniques like static or dynamic binary translation, instruction level tracing, or, for syscalls, something like <a href="https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/">KRF</a>. <a href="#fnref:dynamic" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:malloc" role="doc-endnote">
      <p>Some allocators actually suggest this technique as a relatively simple integration strategy; see for example <a href="https://github.com/jemalloc/jemalloc/wiki/Getting-Started">jemalloc</a>. <a href="#fnref:malloc" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:never" role="doc-endnote">
      <p>In principle, this should never really happen. In practice, one of the ways I intend to extend <code class="language-plaintext highlighter-rouge">upgrayedd</code> is by allowing it to pre-collect targets in <a href="https://internals.rust-lang.org/t/from-life-before-main-to-common-life-in-main/16006">“life before main”</a>, which in turn is less reliable in terms of pre-loaded shared objects and runtime initialization dependencies. <a href="#fnref:never" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:deps" role="doc-endnote">
      <p>This is just as true for C; the only real difference is that Rust makes it <em>way</em> easier to introduce a bunch of dependencies that may interfere with your expected interpositions. <a href="#fnref:deps" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:symbol" role="doc-endnote">
      <p>Lots of other things show up in the symbol table: locals, globals, GNUisms like <a href="https://sourceware.org/glibc/wiki/GNU_IFUNC"><code class="language-plaintext highlighter-rouge">IFUNC</code></a>. <a href="#fnref:symbol" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:dllinjection" role="doc-endnote">
      <p>I know about DLL injection; I don’t know if there’s a “blessed correct” way to do it. <a href="#fnref:dllinjection" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:xform" role="doc-endnote">
      <p>For example: transforming a parameter sequence like <code class="language-plaintext highlighter-rouge">foo: u8, bar: u32</code> into just <code class="language-plaintext highlighter-rouge">u8, u32</code> requires a <a href="https://github.com/woodruffw/upgrayedd/blob/bd5878aa972c6ddf5cfee0a7e2ac5a0c29657aa6/upgrayedd-macros/src/lib.rs#L12-L30">remarkably large and lossy helper function</a>. <a href="#fnref:xform" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="devblog" /><category term="rust" /><category term="programming" /><category term="security" /><summary type="html"><![CDATA[Yet another announcement-type post, this time for a small Rust library I hacked up while trying to deduplicate some boilerplate in another project: upgrayedd.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.yossarian.net/assets/upgrayedd.png" /><media:content medium="image" url="https://blog.yossarian.net/assets/upgrayedd.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Some concerns with OpenPubKey</title><link href="https://blog.yossarian.net/2023/10/18/Some-concerns-with-OpenPubKey" rel="alternate" type="text/html" title="Some concerns with OpenPubKey" /><published>2023-10-18T00:00:00+00:00</published><updated>2023-10-18T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/10/18/Some-concerns-with-OpenPubKey</id><content type="html" xml:base="https://blog.yossarian.net/2023/10/18/Some-concerns-with-OpenPubKey"><![CDATA[<h2 id="preword">Preword</h2>

<p><em>This post was mostly written from an airplane. Please forgive any typos or small
errors; I appreciate being notified of them and will make an effort to fix them.</em></p>

<p>Disclosure: I am professionally involved in the Sigstore community,
and currently maintain Sigstore’s <a href="https://github.com/sigstore/sigstore-python">Python client</a>. While I work on Sigstore
professionally, the content below is my <em>personal</em> opinion and not necessarily
the opinion of my employer, the Sigstore project, or anyone else in this general
space.</p>

<hr />

<p>Earlier this month, the Linux Foundation announced OpenPubKey as a new member
project. You can read their announcement <a href="https://www.linuxfoundation.org/press/announcing-openpubkey-project">here</a>. Some other resources,
which have been around for a few weeks to months, came to my attention
at around the same time:</p>

<ul>
  <li><a href="https://github.com/openpubkey/openpubkey">OpenPubKey’s reference implementation</a></li>
  <li><a href="https://eprint.iacr.org/2023/296.pdf">A pre-print on IACR ePrint for OpenPubKey</a></li>
  <li>BastionZero’s OpenPubKey <a href="https://www.bastionzero.com/openpubkey">explainer</a> and <a href="https://www.bastionzero.com/blog/bastionzeros-openpubkey-why-i-think-it-is-the-most-important-security-research-ive-done">blog post</a></li>
</ul>

<p>I’ll start by saying that I think the <em>idea</em> behind OpenPubKey is
<strong>extremely cool</strong> and demonstrates the (basic) workability
of a technique (binding an ephemeral signing key to a semi-permanent identity
in a globally verifiable way <em>without</em> additional trusted services) that
I think is both extremely useful and powerful.</p>

<p>At the same time, I have concerns about how OpenPubKey’s privacy
properties, its actual ability to provide reliable “keyless” signatures,
and its compatibility with and implications for OIDC practices within IdPs. This
post is an attempt to elaborate on those concerns.</p>

<p>Finally: I don’t believe these concerns are uniquely mine. There’s
<a href="https://blog.sigstore.dev/openpubkey-and-sigstore/">a post on the Sigstore blog</a> that in particular covers some of the same
privacy concerns, and probably others that I’ve missed.</p>

<h2 id="quick-background">Quick background</h2>

<p>At a very high level: Sigstore and OpenPubKey attempt to solve the same problem,
namely: giving users the ability to produce signatures bound to a publicly
verifiable identity. Doing this reliably solves two <em>huge</em> problems in
“ordinary” distributed<sup id="fnref:distributed" role="doc-noteref"><a href="#fn:distributed" class="footnote" rel="footnote">1</a></sup> signing application:</p>

<ol>
  <li>
    <p><em>Identity establishment</em>. Traditional signing schemes punt identity
verification to the end user, e.g. by expecting them to confirm that a
certificate claiming to represent an entity is in fact controlled
by that entity.</p>

    <p>Binding key material directly to an identity credential
sidesteps this: the user still needs to determine <em>if</em> they trust a
particular identity, but an attacker can no longer easily<sup id="fnref:easily" role="doc-noteref"><a href="#fn:easily" class="footnote" rel="footnote">2</a></sup>
<em>impersonate</em> that identity with key material that they wholly control.</p>
  </li>
  <li>
    <p><em>Key management</em>. Even when traditional signing schemes produce identity
mappings that they are confident in, they must still perform manual
key management: rotating out expired keys, periodically retrieving
new keys for identities (and verifying them either in-band or out-of-band),
&amp;c. These tasks are manual, operationally complex, and error prone.</p>

    <p>Using verifiable identities rather than the keys themselves sidesteps
<em>all</em> of this and allows all parties to effectively be “keyless” at the
end signature level: all private key material for signing operations can be
discarded immediately, since binding a new ephemeral key to the same identity
is relatively inexpensive.</p>
  </li>
</ol>

<p>Both Sigstore and OpenPubKey used <a href="https://openid.net/developers/how-connect-works/">OpenID Connect</a> as their underlying source
of identity and identity credentials, but their <em>use</em> of those credentials
differs significantly:</p>

<ul>
  <li>
    <p>Sigstore maintains a set of services, which it calls the
“public good” instance. These services include a Certificate Authority
(<a href="https://github.com/sigstore/fulcio">Fulcio</a>) that’s responsible for accepting identity tokens, binding
them to signing keys, and publishing the bound result as an X.509 certificate.</p>

    <p>Sigstore <em>also</em> includes transparency services (a CT log for Fulcio itself,
plus <a href="https://github.com/sigstore/rekor">Rekor</a> for artifact transparency) that are intended to keep Sigstore’s
infrastructure auditable and honest, but using Sigstore <em>does</em> fundamentally
involve placing trust in its CA.</p>
  </li>
  <li>
    <p>OpenPubKey does not use a Certificate Authority, or any other external
services besides the identity provider itself. Instead, it performs a trick
(more on this in a bit) to bind a signing key <em>directly</em> to the identity
credential produced by the identity provider.</p>
  </li>
</ul>

<p>Or, as a TL;DR: Sigstore’s design involves trusted (but transparent and
publicly auditable) services that <em>transform</em> an identity credential and a
public key into a new certificate, while OpenPubKey’s design involves
<em>producing</em> an identity credential with a public key already bound to it.</p>

<h2 id="privacy">Privacy</h2>

<p>As mentioned, OpenPubKey does not use a Certificate Authority or other
form of indirection to bind a signing key to an identity token. Instead,
the identity token <em>itself</em> is generated by the identity provider in a way that
binds it to the key, and is <em>directly</em> shared with verifying parties (including
potentially the public Internet).</p>

<p>This presents a problem: OIDC is intended to be a <em>private</em> credential format,
with the expectation that only a few designated parties<sup id="fnref:audience" role="doc-noteref"><a href="#fn:audience" class="footnote" rel="footnote">3</a></sup> are trusted
to receive and verify it.</p>

<p>Consequently, OIDC identity providers may include default claim values in
their identity tokens that are neither required nor expected in a public
key binding: things like secondary email addresses, service account identifiers,
account statuses (e.g. whether using 2FA, whether suspended), gender,
birth date, &amp;c.
OpenID Connect even includes a <a href="https://openid.net/specs/openid-connect-core-1_0.html#AddressClaim">standard claim for mailing addresses</a>!</p>

<p>More generally (and this will be a recurring theme), OpenID Connect providers
are <strong>not</strong> bound to only exposing a particular set of claims over time:
providers <em>may</em> (and, in practice, regularly do) choose to change their claim
sets and claim contents over time. In other words: an identity credential
that previously only exposed an email address could suddenly being exposing
DOB and gender claims, with no advance notice (or particular visibility,
unless the user is manually inspecting the JWT claims on each signing
operation).</p>

<p>At best, this is extremely surprising: OpenPubKey users <em>may</em> believe that
they’re only exposing a “sufficient” amount of identity information for
verification purposes, when in reality they’re at the mercy of the (largely
default) claim behavior of third-party IdPs.</p>

<p>At worst, this can be a serious privacy and security concern, <em>especially</em> if
composed with a key transparency or logging scheme: a user seeking to make public
signatures may accidentally end up permanently disclosing personal details
(or account security posture information).</p>

<p>How does Sigstore fare, by contrast? It too exposes some OIDC claim values,
but in a much more controlled manner: claims are filtered through per-IdP
allowlists before being embedded in issued certificates. As a result, neither
the entire claim set nor any surprise claim additions by an IdP ultimately
surface in Sigstore-issued certificates. Critically, the identity token
<em>itself</em> is never shared beyond Sigstore’s services.</p>

<h2 id="idp-behavior-and-expectations">IdP behavior and expectations</h2>

<h3 id="key-binding-and-compatibility">Key binding and compatibility</h3>

<p>As mentioned above: OpenPubKey essentially performs a hack to “trick” an
identity provider into binding an identity credential to a user-controlled
public key.</p>

<p>That “trick” is to stuff a public key identifier along with some other
fields<sup id="fnref:cic" role="doc-noteref"><a href="#fn:cic" class="footnote" rel="footnote">4</a></sup> into a pre-existing user-controllable claim. The whitepaper
mentions using the <code class="language-plaintext highlighter-rouge">nonce</code> claim for this purpose, but the reference
implementation <a href="https://github.com/openpubkey/openpubkey/blob/1ff9431d1079460eda1025ea3d71a67d7a5e25ae/parties/githubclient.go#L61-L75">appears to use <code class="language-plaintext highlighter-rouge">aud</code> instead</a>.</p>

<p>This discrepancy isn’t particularly noteworthy, except for what it implies:
that the OpenPubKey developers discovered that many OIDC IdPs don’t actually
support a user-controlled <code class="language-plaintext highlighter-rouge">nonce</code> at all, frequently because identity token
retrieval is done through a dedicated REST API endpoint rather than a full
OAuth2 flow.</p>

<p>More generally, however:</p>

<ol>
  <li>Support for these kinds of claim shenanigans is a mixed bag, and again
falls into the gray space of “might be arbitrarily changed by an IdP
tomorrow, breaking signature generation for everyone.”</li>
  <li>The <code class="language-plaintext highlighter-rouge">aud</code> claim, in particular, is not always exposed as a fully
user-controlled claim by many IdPs: it may instead be subject to an allowlist
of audiences known to the OIDC provider, may have length or format
restrictions, or may not even be a string (OIDC does not require this,
and some IdPs prefer to encode it as an array of audiences).</li>
</ol>

<p>Combined, these make for a somewhat shaky basis for a distributed verification
system. Sigstore sidesteps these issues again through its use of trusted
services: discrepancies between claim formats are handled by Fulcio’s use of
<a href="https://github.com/dexidp/dex">Dex</a>, and Sigstore currently only requires <code class="language-plaintext highlighter-rouge">aud="sigstore"</code> in terms of
deoting the public good instance as the intended audience.</p>

<h3 id="expiration">Expiration</h3>

<p>I don’t have much to say about this, except for highlighting this section
as probably a bad idea:</p>

<blockquote>
  <p>To enable verifying parties to enforce expiration if they so wish
while avoiding the bad user experience of forcing the user to
run the Authorization Code Flow every time the ID Token
in their PK Token expires, we use the following alternative
expiration mechanism. We do not expire a PK Token when
the underlying ID Token expires according to its exp claim.
Instead verifiers wishing to enforce expiration inspect the iat
(issued at) claim, which specifies when the OP issued the ID
Token, and reject the PK Token if it is older than two weeks.
That is, if expiration is enforced, a PK Token expires two
weeks after it is issued. (Page 9, S. 3.5.2)</p>
</blockquote>

<p>Implicit expiration times like this are pretty dangerous, especially when
describes in “may” language. See below for further thoughts on how this is
likely to interact poorly with JWKS rotation practices, as well as difficult
to reconcile with revocation.</p>

<h2 id="long-term-signatures">Long-term signatures</h2>

<p>I’ve put this section last because it isn’t <em>fully</em> clear to me that
long-term signatures are an actual goal of the OpenPubKey design<sup id="fnref:lts" role="doc-noteref"><a href="#fn:lts" class="footnote" rel="footnote">5</a></sup>. As such,
the notes and concerns here <em>may</em> not be applicable. However, because the
preprint mentions an “archival log” for handling verifications of older
signatures, I felt that it’s appropriate to include my thoughts as well<sup id="fnref:sjwks" role="doc-noteref"><a href="#fn:sjwks" class="footnote" rel="footnote">6</a></sup>.</p>

<p>One of the building blocks for OIDC is the JSON Web Key Set (JWKS) for a given
service; this set is (typically) found through a
<a href="https://openid.net/specs/openid-connect-discovery-1_0.html"><code class="language-plaintext highlighter-rouge">.well-known</code> discovery procedure</a>
and is comprised of individual <a href="https://datatracker.ietf.org/doc/html/rfc7517">JSON Web Key</a>-formatted public keys.</p>

<p>JWKS allows OIDC to sidestep the hard problem of timely,
reliable key rotation: if an IdP discovers that their signing material
is compromised, they can remove the corresponding public key from their
JWKS and <em>effectively</em> revoke it for all current users<sup id="fnref:revoke" role="doc-noteref"><a href="#fn:revoke" class="footnote" rel="footnote">7</a></sup>.</p>

<p>Notably, JSON Web Keys do not carry their own expiries, revealing
the other edge of the sword: an IdP may choose to rotate their
keys at <em>any</em> point, and are not required to adhere to any particular
schedule or sensible cadence.</p>

<p>Given that an OIDC IdP can choose to rotate their keys at any time
and thereby render the current “batch” of OpenPubKey-bound tokens
invalid, <strong>how does OpenPubKey ensure that signatures remain verifiable</strong>?
This is what the preprint says:</p>

<blockquote>
  <p>The main challenge facing archival verifiers is that the OP
public key necessary to verify the PK Token may not longer
be available at the OP’s JWKS endpoint. Most OP’s (sic) rotate
their signing keys out of JWKS endpoint between every two
weeks to a four times a year. To ensure the
verifier has the OP’s public key for the PK Token sent with the
OSMs, the verifier must create and maintain an archival log of
OP public keys. This archival log must cover the time period
that verifier wishes to verify PK Tokens over. The verifier
builds this archival log by regularly downloading the public
keys from the OP’s JWKS endpoint. The verifier not only
records the public keys but also the time at which the public
keys were downloaded. (Page 9, S. 3.5.3)</p>
</blockquote>

<p>This scheme is concerning for three reasons:</p>

<ol>
  <li>The “archival log” in question is underspecified. If you squint at it,
it sounds a bit like a transparency log<sup id="fnref:rekor" role="doc-noteref"><a href="#fn:rekor" class="footnote" rel="footnote">8</a></sup>, but inclusion enforcement
and verifiable lookups are not mentioned directly<sup id="fnref:appendix-c" role="doc-noteref"><a href="#fn:appendix-c" class="footnote" rel="footnote">9</a></sup>. In other words:
if the archival log is itself a third-party service, then consumers
of that service must effectively trust it even more than they would
a verifiable transparency log (since the archival log can lie about
its contents at any point).</li>
  <li>(Seemingly) no concern is given to revocation: neither OIDC nor OIDC discovery
provide a mechanism for clarifying that a key that has been rotated out of
the JWKS has been rotated for reasons of compromise; it’s unclear
how (if?) the archive log maintainer is intended to detect that
case and express it within the log.</li>
  <li>Completeness is not addressed: a user maintaining an archival log will
presumably need to periodically poll their salient IdPs for new JWKS.
Given that an IdP is perfectly within its rights to rotate its JWKS
members at any time without warning, it’s unclear how an archival log
maintainer can be confident that their log contains all of the keys
needed to verify a given IdP’s identity tokens.</li>
</ol>

<p>Of these concerns, (2) is (in my opinion) the most serious: OIDC itself
contains no real mechanisms for long-term revocation, because its underlying key
establishment mechanism is intended to avoid those problems in the first place.
By reintroducing long-term public key handling in the form of an
(unverifiable?) archival log, <strong>OpenPubKey effectively subverts
OIDC’s primary mechanism for handling key material compromise.</strong></p>

<h2 id="summary">Summary</h2>

<p>Again, I’d like to emphasize that I think the idea behind OpenPubKey is
extremely interesting and <strong>fundamentally valuable</strong><sup id="fnref:valuable" role="doc-noteref"><a href="#fn:valuable" class="footnote" rel="footnote">10</a></sup>.
Moreover, I believe it deserves further attention (and development).</p>

<p>At the same time, it’s my <em>current</em> opinion that the <em>current</em> design of
OpenPubKey does not inspire confidence, <em>particularly</em> with respect
to long-term verification: there appear to be a <em>lot</em> of hacks
that assume that arbitrary third-party OIDC IdPs are relatively static,
predictable, and permissive of claim shenanigans, <em>hacks</em> that OIDC
IdPs themselves are unlikely to be willing to ossify into guarantees
(or, more precisely, <strong>cannot</strong> ossify without compromising their
security practices).</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:distributed" role="doc-endnote">
      <p>“Distributed” here just meaning “many independent signers, many independent verifiers.” <a href="#fnref:distributed" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:easily" role="doc-endnote">
      <p>Generally speaking, the attacker would need to (1) either control the identity themselves, or (2) compromise the identity provider for that identity. <a href="#fnref:easily" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:audience" role="doc-endnote">
      <p>The audience, in OAuth and OIDC parlance. <a href="#fnref:audience" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:cic" role="doc-endnote">
      <p>The paper refers to these as the “Client Instance Claims,” reflecting the fact that they’re effectively subject claims made by the key holder and not the identity provider. <a href="#fnref:cic" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:lts" role="doc-endnote">
      <p>Long-term verification is <em>mentioned</em> in the preprint, but is also contrasted with other use cases (like short-term verification, assuming that the window of verification does not overlap with JWKS rotation). <a href="#fnref:lts" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:sjwks" role="doc-endnote">
      <p>The preprint also mentions the <a href="https://openid.net/specs/openid-connect-userinfo-vc-1_0.html">OpenID Connect UserInfo Verifiable Credentials</a> draft standard, which would (as stated) eliminate the need for an archival log by verifying “signed” JWKs instead. This is conceptually appealing (since the “signed” JWKs are chained back up to the Web PKI), but raises separate logistical questions (the IdP must now use their end-entity certificate to sign JWKS, effectively an off-label purpose). <a href="#fnref:sjwks" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:revoke" role="doc-endnote">
      <p>In practice, some users may cache the JWKS, e.g. for as long as the HTTP caching headers dictate. But OIDC tokens themselves are short-lived, so this does not impede timely revocation substantially. <a href="#fnref:revoke" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:rekor" role="doc-endnote">
      <p>And, if they wanted, OpenPubKey <em>could</em> use Rekor (or any other transparency log service) for exactly this purpose. But this would require reintroducing one of Sigstore’s largest online dependencies, begging the question of why not to just use Sigstore outright. <a href="#fnref:rekor" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:appendix-c" role="doc-endnote">
      <p>On another read-through, they receive a short mention in Appendix C (page 18). This appendix says that a verifier “can” use this approach, but doesn’t (to my reading) do a good job of explaining the security benefits (and disadvantages) to doing so. The appendix also mentions that “OP public keys are available on the JWKS URI for at least one week,” which (to my knowledge) is not a guarantee that any OIDC IdP will make (or <em>should</em> make, because it would hamper post-compromise recovery). <a href="#fnref:appendix-c" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:valuable" role="doc-endnote">
      <p>In the sense that exists in contrast to a scheme like Sigstore, which contains multiple online parties with subtle trust relationships between them. <a href="#fnref:valuable" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="cryptography" /><category term="security" /><category term="oss" /><summary type="html"><![CDATA[Preword]]></summary></entry><entry><title type="html">GitHub Actions could be so much better</title><link href="https://blog.yossarian.net/2023/09/22/GitHub-Actions-could-be-so-much-better" rel="alternate" type="text/html" title="GitHub Actions could be so much better" /><published>2023-09-22T00:00:00+00:00</published><updated>2023-09-22T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/09/22/GitHub-Actions-could-be-so-much-better</id><content type="html" xml:base="https://blog.yossarian.net/2023/09/22/GitHub-Actions-could-be-so-much-better"><![CDATA[<p>I <em>love</em> GitHub Actions: I’ve been a daily user of it since 2019 for both professional
and hobbyist projects, and have found it invaluable to both my overall productivity
and peace of mind. I’m <em>just</em> old enough to have used <a href="https://www.travis-ci.com/">Travis CI</a>
et al. professionally before moving to GitHub Actions, and I do not look back with joy<sup id="fnref:back" role="doc-noteref"><a href="#fn:back" class="footnote" rel="footnote">1</a></sup>.</p>

<p>By and large, GitHub Actions continues to delight me and grow new features that I
appreciate: <a href="https://github.blog/2021-11-29-github-actions-reusable-workflows-is-generally-available/">reusable workflows</a>, <a href="https://github.blog/changelog/2021-10-27-github-actions-secure-cloud-deployments-with-openid-connect/">OpenID connect</a>, <a href="https://github.blog/changelog/2022-05-09-github-actions-enhance-your-actions-with-job-summaries/">job summaries</a>, <a href="https://github.blog/changelog/2023-05-09-introducing-actions-on-the-repository-view-on-github-mobile/">integrations into GitHub Mobile</a>,
and so forth.</p>

<p>At the same time, GitHub Actions is a regular source of <em>profound</em> frustration and time loss<sup id="fnref:loss" role="doc-noteref"><a href="#fn:loss" class="footnote" rel="footnote">2</a></sup>
in my development processes. This post lists some of those frustrations, and how I think GitHub
could selfishly<sup id="fnref:selfishly" role="doc-noteref"><a href="#fn:selfishly" class="footnote" rel="footnote">3</a></sup> improve on them (or even fix them outright)<sup id="fnref:roadmap" role="doc-noteref"><a href="#fn:roadmap" class="footnote" rel="footnote">4</a></sup>.</p>

<hr />

<h2 id="debugging-like-im-15-again">Debugging like I’m 15 again</h2>

<p>Here’s a pretty typical session of me trying to set up a release workflow on GitHub Actions:</p>

<p><img src="/assets/github-actions-fails.png" alt="" /></p>

<p>In this particular case, it took me 4 separate commits (and 4 failed releases) to debug
the various small errors I made: not using <code class="language-plaintext highlighter-rouge">${{ ... }}</code><sup id="fnref:jekyll" role="doc-noteref"><a href="#fn:jekyll" class="footnote" rel="footnote">5</a></sup> where I needed to, forgetting
a <code class="language-plaintext highlighter-rouge">needs:</code> relationship, &amp;c.</p>

<p>Here’s another (this time of a PR-creating workflow), from a few weeks later:</p>

<p><img src="/assets/github-actions-fails-2.png" alt="" /></p>

<p>I am not the world’s most incredible programmer; like many (most?), I program intuitively
and follow the error messages until they stop happening.</p>

<p>GitHub Actions is <strong>not</strong> responsible for catching every possible error I could make,
and ensuring that every workflow I write will run successfully on the first try.</p>

<p>At the same time, the current debugging cycle in GitHub Actions is <em>ridiculous</em>:
even the smallest change on the most trivial workflow is a 30+ second process
of tabbing out of my development environment (context switch #1), digging through
my browser for the right tab (context switch #2), clicking through the infernal
nest of actions summaries, statuses, &amp;c. (context switch #3), and impatiently
refreshing a buffered console log to figure out which error I need to fix next
(context switch #4). Rinse and repeat.</p>

<h3 id="fixing-this">Fixing this</h3>

<ul>
  <li>
    <p>Give us an interactive debugging shell, or (at least) let us re-run workflows
with small changes <em>without</em> having to go through a <code class="language-plaintext highlighter-rouge">git add; git commit; git push</code> cycle<sup id="fnref:breaks" role="doc-noteref"><a href="#fn:breaks" class="footnote" rel="footnote">6</a></sup>.</p>
  </li>
  <li>
    <p>Give us a repository setting to reject commits with obviously invalid workflows (things
like syntax that can’t possibly work, or references to jobs/steps that don’t exist).
It’s <em>infuriating</em> when I <code class="language-plaintext highlighter-rouge">git push</code> a workflow that silently fails because of invalid YAML;
<em>especially</em> when I then merge that workflow’s branch under the mistaken impression
that the workflow is <em>passing</em>, rather than not running at all.</p>
  </li>
</ul>

<h2 id="security-woes">Security woes</h2>

<p>Speaking from experience: it’s <em>shockingly</em> easy to wreck yourself with GitHub Actions. <em>Way</em> easier
than it should be.</p>

<p>Here is just a small handful of the ways in which I have <em>personally</em> written potentially vulnerable
workflows over the past few years:</p>

<ol>
  <li>
    <p>Using the <code class="language-plaintext highlighter-rouge">${{ ... }}</code> expansion syntax in a shell or other context where a
(potentially malicious) user controls the expansion’s contents. The following, for example, would
allow a user to inject code that could then exfiltrate <code class="language-plaintext highlighter-rouge">$MY_IMPORTANT_SECRET</code>:</p>

    <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">do something serious</span>
  <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
   <span class="s">something-serious "${{ inputs.frob }}"</span>
  <span class="na">env</span><span class="pi">:</span>
    <span class="na">MY_IMPORTANT_SECRET</span><span class="pi">:</span> <span class="s">${{ secrets.MY_IMPORTANT_SECRET }}</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>

    <p>Some among you will observe that a ✨good✨ programmer would simply know
not to do this, and that a bad programmer would eventually learn their
(painful) lesson. This might be an acceptable position for a niche
piece of software to hold; it is <strong>not</strong> an acceptable position
for the CI/CD platform that, to a first approximation, hosts the entire
open source ecosystem.</p>
  </li>
  <li>
    <p>Using <code class="language-plaintext highlighter-rouge">pull_request_target</code>. As far as I can tell, it’s <em>practically</em>
impossible to use this event safely in a non-trivial workflow<sup id="fnref:trivial" role="doc-noteref"><a href="#fn:trivial" class="footnote" rel="footnote">7</a></sup>.</p>

    <p>This event appears to exist for an <em>extremely</em> narrow intended use case, i.e.
labeling or commenting on PRs that come from forks. I don’t understand
why GitHub Actions chooses to expose such a (relatively) simple operation
through as massive of a foot-gun as <code class="language-plaintext highlighter-rouge">pull_request_target</code>.</p>
  </li>
  <li>
    <p>Over-scoping my workflow and job-level permissions.</p>

    <p>The default access set for Actions’ ordinary <code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> is
<a href="https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token"><em>very</em> permissive</a>:
the only thing it <em>doesn’t</em> provide access to are the workflow’s OpenID Connect token.</p>

    <p>This consistently bites me in two different ways:</p>

    <ol>
      <li>I consistently forget to down-scope the default token, especially when
working with repositories under my personal account (rather than under an org,
where the default scope can be reduced across all repositories).</li>
      <li>
        <p>I consistently <em>over-scope</em> my tokens because I don’t know exactly
how much access my workflow will need.</p>

        <p>This is further complicated by the messy ways in which GitHub’s permission
model gets shoehorned into a single permissions dimension of <code class="language-plaintext highlighter-rouge">read/write/none</code>:
why does <code class="language-plaintext highlighter-rouge">id-token: write</code> grant me the ability to <strong>read</strong> the workflow’s OpenID Connect
token? Why do
<a href="https://docs.github.com/en/rest/overview/permissions-required-for-github-apps?apiVersion=2022-11-28#repository-permissions-for-repository-security-advisories">some <code class="language-plaintext highlighter-rouge">GET</code> operations</a>
on security advisories require <code class="language-plaintext highlighter-rouge">write</code>, while others only require <code class="language-plaintext highlighter-rouge">read</code>?</p>
      </li>
    </ol>
  </li>
</ol>

<p>There are also a few things that I <em>haven’t</em> done<sup id="fnref:donetome" role="doc-noteref"><a href="#fn:donetome" class="footnote" rel="footnote">8</a></sup>, but are scary enough that I think they’re worth
mentioning.</p>

<p>For example, can you see what’s wrong with this workflow step?</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre><span class="na">steps</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Despite all appearances, SHA ref
<a href="https://github.com/actions/checkout/commit/c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e"><code class="language-plaintext highlighter-rouge">c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e</code></a>
is <strong>not</strong> a commit on the <code class="language-plaintext highlighter-rouge">actions/checkout</code> repository! It’s actually a commit on a fork in
<code class="language-plaintext highlighter-rouge">actions/checkout</code>’s network which, thanks to GitHub’s use of
<a href="https://github.blog/2015-09-22-counting-objects/#your-very-own-fork-of-rails">alternates</a>,
<em>appears</em> to belong to the parent repository.</p>

<p><a href="https://www.chainguard.dev/unchained/what-the-fork-imposter-commits-in-github-actions-and-ci-cd">Chainguard has an excellent post on this</a><sup id="fnref:stole" role="doc-noteref"><a href="#fn:stole" class="footnote" rel="footnote">9</a></sup>,
but to summarize:</p>

<ol>
  <li>SHA references from forks are visually indistinguishable from SHA references
in the intended target repository. The only way to tell the two apart is
to manually inspect each reference and confirm that it appears on the expected
repository, and not one of its forks.</li>
  <li>GitHub’s own REST API makes no distinction between SHA references in a repository
graph — <code class="language-plaintext highlighter-rouge">/repos/{user}/{repo}/commits/{ref}</code> returns a JSON response that <em>only</em> references
<code class="language-plaintext highlighter-rouge">{user}/{repo}</code>, even if <code class="language-plaintext highlighter-rouge">{ref}</code> is only on a fork.</li>
  <li>Because GitHub fails to distinguish between fork and non-fork SHA references, forks
can bypass security settings on GitHub Actions that would otherwise restrict
actions to only “trusted” sources (such as GitHub themselves or the repository’s
own organization).</li>
</ol>

<p>GitHub’s response to this (so far) has been to add
<a href="https://docs.github.com/en/actions/learn-github-actions/finding-and-customizing-actions#using-shas">a little bit of additional language</a>
to their documentation, rather than to forbid misleading SHA references outright.</p>

<h3 id="fixing-this-1">Fixing this</h3>

<ul>
  <li>
    <p>Give us push-time rejection of obviously insecure workflows. In other words:
let us toggle<sup id="fnref:default" role="doc-noteref"><a href="#fn:default" class="footnote" rel="footnote">10</a></sup> a “paranoid workflow security” mode that, when enabled,
causes <code class="language-plaintext highlighter-rouge">git push</code> to fail with an explanation of what I’m doing wrong. Essentially
the same thing as the debugging request above, but for security!</p>
  </li>
  <li>
    <p>Give us runtime checks on our workflows, analogous to runtime instrumentation like
<a href="https://clang.llvm.org/docs/AddressSanitizer.html">AddressSanitizer</a>
in the world of compiled languages. There are <em>so many</em> things that could
be turned into hard failures for security wins without breaking 99.9% of legitimate
users, like failing any attempt to use <code class="language-plaintext highlighter-rouge">actions/checkout</code> on a <code class="language-plaintext highlighter-rouge">pull_request_target</code>
with a ref that isn’t from the targeted repository.</p>
  </li>
  <li>
    <p>Maybe just deprecate and remove <code class="language-plaintext highlighter-rouge">pull_request_target</code> entirely.
<a href="https://securitylab.github.com/research/github-actions-preventing-pwn-requests/">GitHub’s own Security Lab</a>
has been aware of how dangerous this event is for years; maybe it’s time to get rid of it
entirely.</p>
  </li>
  <li>
    <p>Allow us to set a more restrictive default token scope on our personal repositories,
similar to how organizations and enterprises can restrict their default
<code class="language-plaintext highlighter-rouge">GITHUB_TOKEN</code> scopes across all repositories at once.</p>
  </li>
  <li>
    <p>By default, reject any SHA-pinned action for which the SHA only appears
on a fork and not the referenced repository. It’s hard to imagine a
<em>legitimate</em> reason to ever need to do this!</p>
  </li>
</ul>

<h2 id="real-types-would-be-nice">Real types would be nice</h2>

<p>When writing a custom GitHub Action, you can specify the actions inputs
using a mapping under the <code class="language-plaintext highlighter-rouge">inputs:</code> key. For example, the following
defines a <code class="language-plaintext highlighter-rouge">frobulation-level</code> input with a description (used for tooltips
in many IDEs) and a default value:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="na">inputs</span><span class="pi">:</span>
  <span class="na">frobulation-level</span><span class="pi">:</span>
    <span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">the</span><span class="nv"> </span><span class="s">level</span><span class="nv"> </span><span class="s">to</span><span class="nv"> </span><span class="s">frobulate</span><span class="nv"> </span><span class="s">at"</span>
    <span class="na">default</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1"</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Notably, this syntax does <strong>not</strong> allow for type enforcement; the following
<strong>does not work</strong>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="na">inputs</span><span class="pi">:</span>
  <span class="na">frobulation-level</span><span class="pi">:</span>
    <span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">the</span><span class="nv"> </span><span class="s">level</span><span class="nv"> </span><span class="s">to</span><span class="nv"> </span><span class="s">frobulate</span><span class="nv"> </span><span class="s">to"</span>
    <span class="na">default</span><span class="pi">:</span> <span class="m">1</span>
    <span class="c1"># NOTE: this SHOULD cause a workflow failure if the input</span>
    <span class="c1"># isn't a valid number, but doesn't</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">number</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This absence is strange, but what makes it <em>bizarre</em> is that GitHub is <strong>inconsistent</strong>
about where types can appear in actions and workflows:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">workflow_call</code> supports <code class="language-plaintext highlighter-rouge">type</code> with <code class="language-plaintext highlighter-rouge">boolean</code>, <code class="language-plaintext highlighter-rouge">number</code>, or <code class="language-plaintext highlighter-rouge">string</code></li>
  <li><code class="language-plaintext highlighter-rouge">workflow_dispatch</code> supports <code class="language-plaintext highlighter-rouge">type</code> with <code class="language-plaintext highlighter-rouge">boolean</code>, <code class="language-plaintext highlighter-rouge">choice</code>, <code class="language-plaintext highlighter-rouge">number</code>, or <code class="language-plaintext highlighter-rouge">string</code></li>
  <li>Action inputs: no types at all</li>
</ul>

<p>Unfortunately, this is only the first level: even inputs that <em>do</em> support
typing doesn’t support compounded data structures, like lists or objects.
For example, neither of the following works:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">example/example</span>
  <span class="na">with</span><span class="pi">:</span>
    <span class="c1"># INVALID: can't use arrays as inputs</span>
    <span class="na">paths</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">foo</span><span class="pi">,</span> <span class="nv">bar</span><span class="pi">,</span> <span class="nv">baz</span><span class="pi">]</span>
    <span class="c1"># INVALID: can't use objects as inputs</span>
    <span class="na">headers</span><span class="pi">:</span>
      <span class="na">foo</span><span class="pi">:</span> <span class="s">bar</span>
      <span class="na">baz</span><span class="pi">:</span> <span class="s">quux</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…which means that action writers end up requiring users to do silly things like these:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">example/example</span>
  <span class="na">with</span><span class="pi">:</span>
    <span class="c1"># SILLY: action does ad-hoc CSV-ish parsing</span>
    <span class="na">paths</span><span class="pi">:</span> <span class="s">foo,bar,baz</span>
    <span class="c1"># SILLY: action forcefully flattens a natural hierarchy</span>
    <span class="na">header-foo</span><span class="pi">:</span> <span class="s">bar</span>
    <span class="na">header-baz</span><span class="pi">:</span> <span class="s">quux</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This is bad for maintainability, and bad for security: maintainability
because actions must carefully manage a single flat namespace of inputs
(with no types!), and security because both action writer and workflow writer
are forced into <a href="https://langsec.org/occupy/">ad-hoc, unspecified languages</a>
for complex inputs.</p>

<h3 id="fixing-this-2">Fixing this</h3>

<ul>
  <li>
    <p>Let action and workflow writers use <code class="language-plaintext highlighter-rouge">type:</code> everywhere, and let
us use <code class="language-plaintext highlighter-rouge">choice</code> everywhere — not just in <code class="language-plaintext highlighter-rouge">workflow_dispatch</code>!</p>
  </li>
  <li>
    <p>Give us stricter type-checking. Where action and workflow types
can be inferred statically, detect errors and reject incorrectly typed
workflow changes at <code class="language-plaintext highlighter-rouge">push</code> time, rather than waiting for the workflow
to inevitably fail.</p>
  </li>
  <li>
    <p>Give us <code class="language-plaintext highlighter-rouge">type: object</code> and <code class="language-plaintext highlighter-rouge">type: array</code> types. These won’t be perfect
to start with (thanks to potentially heterogeneous interior types),
but they’ll be a significant improvement over the status quo. Implementation-wise,
forward these as JSON-serialized strings or something similar<sup id="fnref:json" role="doc-noteref"><a href="#fn:json" class="footnote" rel="footnote">11</a></sup> where
appropriate (such as in auto-created <code class="language-plaintext highlighter-rouge">INPUT_{WHATEVER}</code> environment variables).</p>
  </li>
</ul>

<h2 id="more-official-actions-would-be-nice">(More) official actions would be nice</h2>

<p>The third-party ecosystem on GitHub Actions is great: there are a <em>lot</em>
of high-quality, easy-to-use actions being maintained by open source contributors.
I maintain a <a href="https://github.com/pypa/gh-action-pip-audit">handful</a>
<a href="https://github.com/sigstore/gh-action-sigstore-python">of them</a>!</p>

<p>Beneath the surface of these excellent third-party actions is a substrate
of <em>official</em>, GitHub-maintained actions. These actions primarily address
three classes of fundamental CI/CD activities:</p>

<ol>
  <li>Core <code class="language-plaintext highlighter-rouge">git</code> operations: <code class="language-plaintext highlighter-rouge">actions/checkout</code></li>
  <li>Core GitHub operations and repository housekeeping: <code class="language-plaintext highlighter-rouge">actions/{upload,download}-artifact</code>,
<code class="language-plaintext highlighter-rouge">actions/cache</code>, <code class="language-plaintext highlighter-rouge">actions/stale</code></li>
  <li>General (but essential) configuration: <code class="language-plaintext highlighter-rouge">actions/setup-python</code>, <code class="language-plaintext highlighter-rouge">actions/setup-node</code></li>
</ol>

<p>These classes are somewhat distinct from “higher-level” workflows (like the kind
I write): because of their centrality and universal demand, they benefit from
singular, high-quality, <em>officially maintained</em> implementations.</p>

<p>And so, the question: <strong><em>why are there so few of them</em></strong>?</p>

<p>Here is just a smattering of the official actions that <em>don’t</em> exist:</p>

<ol>
  <li><em>Programmatically adding a pull request to a merge queue</em>. GitHub <em>has</em> the machinery to
support this: <a href="https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request-with-a-merge-queue?tool=cli"><code class="language-plaintext highlighter-rouge">gh pr merge</code> already exists</a>.
It just isn’t exposed as an action; users are (presumably) expected
to piece it together themselves.</li>
</ol>

<p>Even worse, there are actions that <em>did</em> exist but were deprecated (generally
for unclear reasons<sup id="fnref:time" role="doc-noteref"><a href="#fn:time" class="footnote" rel="footnote">12</a></sup>):</p>

<ol>
  <li><a href="https://github.com/actions/create-release"><code class="language-plaintext highlighter-rouge">actions/create-release</code></a>:
<a href="https://github.com/actions/create-release/issues/119">unmaintained as of March 2021</a>. Users
encouraged to switch to various community maintained workflows, most notably<sup id="fnref:notably" role="doc-noteref"><a href="#fn:notably" class="footnote" rel="footnote">13</a></sup>
<a href="https://github.com/softprops/action-gh-release"><code class="language-plaintext highlighter-rouge">softprops/action-gh-release</code></a>.</li>
  <li><a href="https://github.com/actions/upload-release-asset"><code class="language-plaintext highlighter-rouge">actions/upload-release-asset</code></a>: marked
as unmaintained at the same time as <code class="language-plaintext highlighter-rouge">actions/create-release</code>.</li>
  <li><a href="https://github.com/actions/setup-ruby"><code class="language-plaintext highlighter-rouge">actions/setup-ruby</code></a>:
<a href="https://github.com/actions/setup-ruby/issues/97">unmaintained as of February 2021</a>. Users
encouraged to switch to <a href="https://github.com/ruby/setup-ruby"><code class="language-plaintext highlighter-rouge">ruby/setup-ruby</code></a>.</li>
</ol>

<p>I’m sympathetic to the individual maintainers here and, in each case, the transition
to a “recommended” third-party action was relatively painless.</p>

<p>Still, the overall impression given here is unmistakable: that GitHub does not see <em>official</em>
actions for its own platform features (or key ecosystem users, like Ruby) as priorities<sup id="fnref:priorities" role="doc-noteref"><a href="#fn:priorities" class="footnote" rel="footnote">14</a></sup>,
and would rather have the community develop and choose unofficial favorites. This is
<strong>not unreasonable</strong> on a strategic level (it induces third-party development
in their ecosystem), but has a <em>deleterious effect</em> on trust in the platform. I’d like
to be able to write workflows and know that they’ll run (with minimal changes) 5 years from
now, and not worry that GitHub has abandoned core pieces underneath me!</p>

<p>Apart from imparting a general feeling of shabbiness, this compounds with GitHub Action’s
poor security story (<a href="#some-security-would-be-nice">per above</a>): not providing official high-quality actions for their own
API surfaces means that users will <em>continue</em> to make exploitable security mistakes in
their workflows. Nobody wins<sup id="fnref:pentesting" role="doc-noteref"><a href="#fn:pentesting" class="footnote" rel="footnote">15</a></sup>.</p>

<h3 id="fixing-this-3">Fixing this</h3>

<ul>
  <li>
    <p>Give us more official actions. As a <em>very</em> rough rule of thumb: if a thing
directly ties different pieces of GitHub infrastructure together <em>and</em> currently
needs to be done manually (with REST API calls, <code class="language-plaintext highlighter-rouge">gh</code> invocations, or whatever else),
it probably deserves a full official action!</p>
  </li>
  <li>
    <p>Give us more <em>pseudo-official</em> actions. Work with the biggest third-party actions<sup id="fnref:biggest" role="doc-noteref"><a href="#fn:biggest" class="footnote" rel="footnote">16</a></sup>
to form a <code class="language-plaintext highlighter-rouge">community-actions</code> (or whatever) org, with the expectation that actions homed under
that org have been reviewed (at some point) by GitHub, are forced to adhere to best practices
for repository security, receive semantically versioned updates, &amp;c &amp;c.</p>
  </li>
</ul>

<h2 id="wrap-up">Wrap-up</h2>

<p>This is a long and meandering post, and many parts are in conflict: security and stability
(in the form of more official actions that break less often), for example, are in eternal
conflict with each other.</p>

<p>I’m just one user, and I don’t expect my interests or frustrations to be overriding ones.
Still, I hope that the problems (and potential fixes) above aren’t unique to me, and that there are
engineers at GitHub who (again, selfishly!) share these concerns and would like to see
them fixed.</p>

<h3 id="addendum-2023-09-26">Addendum (2023-09-26)</h3>

<p>I got quite a few emails and private messages over this post with similar (excellent!)
points and observations. This is my attempt to collect them:</p>

<ul>
  <li>
    <p>Several people recommended that I try <a href="https://github.com/nektos/act"><code class="language-plaintext highlighter-rouge">nektos/act</code></a> for local
GitHub Actions emulation. Reader, let me tell you: I have! I’ve been using it for a few years,
and it’s a nice tool for rapidly iterating on and debugging <em>simple</em> workflows. In my experience,
however, it’s a lossy tool: I regularly run into situations where <code class="language-plaintext highlighter-rouge">act</code>’s images aren’t <em>quite</em>
like GitHub’s, where events aren’t exactly the same as they appear on GitHub, &amp;c. This isn’t
the fault of the <code class="language-plaintext highlighter-rouge">act</code> maintainers: they’re a third-party trying to follow a largely proprietary
ecosystem that has its own development pace. As a third-party tool that can’t be expected to
obtain perfect compatibility, I didn’t want this post to attract unnecessary complaints (or
misunderstandings) their way,</p>
  </li>
  <li>
    <p>Several people also recommended that I try <a href="https://github.com/rhysd/actionlint"><code class="language-plaintext highlighter-rouge">rhysd/actionlint</code></a>
for local linting (including security linting) of GitHub Actions workflows. Like <code class="language-plaintext highlighter-rouge">act</code>, I have!
I actually use <code class="language-plaintext highlighter-rouge">actionlint</code> much more regularly than I do <code class="language-plaintext highlighter-rouge">act</code>, both for professional development
and my personal projects/contributions. Unfortunately, it has its own deficiencies (for my
purposes, the biggest one is its inability to lint <em>actions</em> instead of just <em>workflows</em>).</p>
  </li>
  <li>
    <p>I was also recommended the <a href="https://marketplace.visualstudio.com/items?itemName=GitHub.vscode-github-actions">GitHub Actions extension for VS Code</a>, which provides additional
in-editor linting (built on top of <a href="https://json.schemastore.org/github-workflow.json">public JSON schemas for GitHub Actions</a>) as well as
integrations with a specific repository’s workflows (showing runs in progress, completing
variables, &amp;c). I’ve been using this extension for a few months, and it’s a
significant improvement over writing workflows blindly. Still, I find that many of my
workflows need a decent amount of add-commit-push debugging, due to things that the extension
can’t detect (typos in secrets, typos in dynamically generated output names, &amp;c).</p>
  </li>
</ul>

<p>Additionally, <a href="https://infosec.exchange/@omglolwtfbbq@hachyderm.io">Julian Dunn, head of project management for GitHub Actions</a>, reached out to me and pointed
out some important features in the works, as well as some clarifications on their internal
priorities:</p>

<ul>
  <li>Interactive debugging is a part of GitHub Actions’
<a href="https://github.com/github/roadmap/issues/637">public roadmap</a>.</li>
  <li>Similarly: GitHub Actions is actively looking to move away from tag- and SHA-based workflow
pinning, and towards something more immutable. This is also on their
<a href="https://github.com/github/roadmap/issues/592">public roadmap</a>.</li>
  <li>Additionally, I was told that GitHub <em>does</em> see the health and quality of the official actions
ecosystem as a priority, and that part of their (current) strategy for doing so is to keep the
overall number of official actions small (to ensure that each receives sufficient maintenance
and attention). This is a very understandable position, and I appreciate the candor
in discussing it!</li>
</ul>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:back" role="doc-endnote">
      <p>In a large part because, at GitHub’s size, I worry much less about <a href="https://techcrunch.com/2019/01/23/idera-acquires-travis-ci/">private equity enshittifying it</a>. <a href="#fnref:back" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:loss" role="doc-endnote">
      <p>Just enough for it to <em>really</em> hurt, against the backdrop of GitHub Actions’ overall productivity benefits. <a href="#fnref:loss" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:selfishly" role="doc-endnote">
      <p>In the sense that these things would be in GitHub’s own self-interest, making GHA <em>even more</em> appealing to developers, further cement its dominance in the CI/CD space, &amp;c. They should do these things for their own sake! <a href="#fnref:selfishly" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:roadmap" role="doc-endnote">
      <p>After finishing this post, I discovered that GitHub has a <a href="https://github.com/orgs/github/projects/4247/views/11">public roadmap for Actions features</a>. Maybe some of my grievances are already known and listed here; it’s a big roadmap! <a href="#fnref:roadmap" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:jekyll" role="doc-endnote">
      <p>Completely unrelated to this post: writing <code class="language-plaintext highlighter-rouge">${{ ... }}</code> is <a href="https://infosec.exchange/@yossarian/110863233787991472">remarkably painful</a> in a Liquid-rendered Jekyll blog. <a href="#fnref:jekyll" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:breaks" role="doc-endnote">
      <p>Yes, I know this fundamentally breaks the GitHub Actions data model; I didn’t say it would be easy! <a href="#fnref:breaks" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:trivial" role="doc-endnote">
      <p>In the sense that “using <code class="language-plaintext highlighter-rouge">pull_request_target</code> safely” means being confident that you <em>never</em> accidentally run <em>anything</em> from the pull request that just triggered your workflow. <a href="#fnref:trivial" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:donetome" role="doc-endnote">
      <p>And I <em>think</em> haven’t been done <em>to</em> me. <a href="#fnref:donetome" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:stole" role="doc-endnote">
      <p>Which I stole the <code class="language-plaintext highlighter-rouge">actions/checkout</code> example from, since I was too lazy to make my own. <a href="#fnref:stole" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:default" role="doc-endnote">
      <p>Even better, make it the default, and require people to click through a “destructive action” modal similar to the ones for other dangerous user or repository setting changes. <a href="#fnref:default" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:json" role="doc-endnote">
      <p>JSON is a semi-obvious choice here, since GitHub Actions already has a <a href="https://docs.github.com/en/actions/learn-github-actions/expressions#fromjson"><code class="language-plaintext highlighter-rouge">fromJSON(...)</code> function</a> and maps cleanly from YAML. <a href="#fnref:json" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:time" role="doc-endnote">
      <p>The primary stated reason is <a href="https://github.com/actions/create-release/issues/119">time</a>, leading to the revelation that these <em>critical</em> actions were <a href="https://github.com/actions/create-release/pull/32#issuecomment-579774032">side projects</a>. That isn’t these engineers’ fault; they seem to have been making the best out of a bad situation! But it’s incredible to see GitHub, organizationally, squander so much value and community goodwill here. <a href="#fnref:time" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:notably" role="doc-endnote">
      <p>In my opinion. It seems to have the most users and most activity, although it’s <em>bonkers</em> that I’m evaluating something as critical as this based on those kind of weak proxy signals. <a href="#fnref:notably" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:priorities" role="doc-endnote">
      <p>See the <a href="#addendum-2023-09-26">addendum</a>: this impression is incorrect. <a href="#fnref:priorities" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pentesting" role="doc-endnote">
      <p>Except for the pentesting industrial complex. <a href="#fnref:pentesting" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:biggest" role="doc-endnote">
      <p>Off the top of my head: actions like <code class="language-plaintext highlighter-rouge">ruby/setup-ruby</code>, <code class="language-plaintext highlighter-rouge">shivammathur/setup-php</code>, and <code class="language-plaintext highlighter-rouge">peaceiris/actions-gh-pages</code> (among others) have hundreds of thousands of active users, and form a critical part of the Actions ecosystem. They should be treated as such! <a href="#fnref:biggest" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="workflow" /><category term="programming" /><category term="rant" /><summary type="html"><![CDATA[I love GitHub Actions: I’ve been a daily user of it since 2019 for both professional and hobbyist projects, and have found it invaluable to both my overall productivity and peace of mind. I’m just old enough to have used Travis CI et al. professionally before moving to GitHub Actions, and I do not look back with joy1. In a large part because, at GitHub’s size, I worry much less about private equity enshittifying it. &#8617;]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.yossarian.net/assets/github-actions-fails-2.png" /><media:content medium="image" url="https://blog.yossarian.net/assets/github-actions-fails-2.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Two quick hacks for laptop in-flight Delta Wi-Fi with T-Mobile</title><link href="https://blog.yossarian.net/2023/08/20/Two-quick-hacks-for-inflight-delta-wifi-with-tmobile" rel="alternate" type="text/html" title="Two quick hacks for laptop in-flight Delta Wi-Fi with T-Mobile" /><published>2023-08-20T00:00:00+00:00</published><updated>2023-08-20T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/08/20/Two-quick-hacks-for-inflight-delta-wifi-with-tmobile</id><content type="html" xml:base="https://blog.yossarian.net/2023/08/20/Two-quick-hacks-for-inflight-delta-wifi-with-tmobile"><![CDATA[<p><em>This post was written on a plane, using one of the tricks below.
Please forgive any small typographical errors; I’ll try and fix them later.</em></p>

<h2 id="quick-context">Quick context</h2>

<p>Most Delta flights have in-flight Wi-Fi, including international flights,
including the one I’m on right now.</p>

<p>Normally you’d have to pay for it (on international flights), which I refuse to
do out of principle.</p>

<p>However, Delta <em>currently</em> has an agreement with T-Mobile, giving T-Mobile
customers <a href="https://www.t-mobile.com/news/un-carrier/free-in-flight-wi-fi-on-delta-air-lines">free in-flight Wi-Fi</a>.</p>

<p>Very cool, thanks Delta and T-Mobile! Except for the fine print: while it’s a normal
unrestricted<sup id="fnref:seemingly" role="doc-noteref"><a href="#fn:seemingly" class="footnote" rel="footnote">1</a></sup> connection, it <em>only</em> works on mobile devices,
<a href="https://www.t-mobile.com/support/coverage/t-mobile-in-flight-connections-on-us">not on laptops</a>.</p>

<p>The justification for this is (probably) that Delta and T-Mobile can’t perform
their little SMS verification flow from my laptop, but this is bull: iMessage can forward
their SMS verification codes just fine, and besides there are other ways for the two
to verify that I’m a T-Mobile subscriber (such as asking my for my T-Mobile
credentials).</p>

<p>This wasn’t satisfying, so I came up with two quick workarounds. The first
<em>definitely</em> works (it’s what I’m using to write this), and the latter
<em>probably</em> works (it was what I was originally planning on using because
it’s a little simpler, but requires you to not previously connect to
inflight Wi-Fi on your phone).</p>

<p>Everything below assumes iOS and macOS, since it’s what I’m on right now,
but nothing about either technique should require either OS.</p>

<h2 id="workaround-1-mac-spoofing">Workaround 1: MAC spoofing</h2>

<p>It’s <em>deeply</em> funny to me that this still works in 2023.</p>

<p>The steps here are pretty simple:</p>

<ol>
  <li>Connect to Delta’s inflight Wi-Fi on your phone and go through the T-Mobile SMS flow;</li>
  <li>Confirm that you’re connected to the interwebz;</li>
  <li>
    <p>Open up your phone’s Wi-Fi settings and make a note of your Wi-FI interface’s MAC</p>

    <p>I have no idea where this information appears on Android. On iOS, it’s
 under the little “info” button for the Wi-Fi network you’re currently on.</p>
  </li>
  <li>
    <p>Turn off your phone’s Wi-Fi</p>

    <p>We don’t want to confuse the inflight Wi-Fi more than necessary by having
 two devices with the same MAC appear simultaneously.</p>
  </li>
  <li>
    <p>On macOS, make sure that any previous configuration for Delta’s inflight
Wi-Fi has been deleted.</p>
  </li>
  <li>
    <p>Force macOS to disassociate from any networks:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre> <span class="nb">sudo</span> /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport <span class="nt">-z</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>
  </li>
  <li>
    <p>Set the MAC for your Wi-Fi interface to the MAC that your phone was using<sup id="fnref:private" role="doc-noteref"><a href="#fn:private" class="footnote" rel="footnote">2</a></sup>:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre> <span class="nb">sudo </span>ifconfig en0 ether AA:BB:CC:AA:BB:CC
 <span class="nb">sudo </span>ifconfig en0 lladdr AA:BB:CC:AA:BB:CC
</pre></td></tr></tbody></table></code></pre></div>    </div>

    <p>…where <code class="language-plaintext highlighter-rouge">en0</code> is your laptop’s Wi-Fi interface and <code class="language-plaintext highlighter-rouge">AA:BB:CC:AA:BB:CC</code> is
 your phone’s Wi-Fi interface’s MAC.</p>

    <p>I’m not sure if <em>both</em> of these are required or not, but experimentally
 using both didn’t hurt things.</p>
  </li>
  <li>
    <p>Connect to Delta’s inflight Wi-Fi again, and enjoy your internet:</p>

    <p><img src="/assets/delta-inflight.png" alt="" /></p>
  </li>
</ol>

<h2 id="workaround-2-user-agent-spoofing">Workaround 2: User agent spoofing</h2>

<p>I’m including this one for good measure, since I’m <em>pretty</em> sure it would work
and is slightly easier (when you do it right). It has a few constraints:</p>

<ol>
  <li>You can’t have previously connected to Delta’s inflight Wi-Fi with your
phone, since we aren’t spoofing the MAC and T-Mobile only seems to allow
one MAC per flight.</li>
  <li>Your laptop needs to be able to receive SMS messages somehow (e.g. via
iMessage with your provider’s support for SMS over Wi=Fi).</li>
</ol>

<p>With this technique, you just change your browser’s user agent to something
that Delta’s interstitial recognizes as a mobile phone to get it to show
you the T-Mobile option.</p>

<p>For Safari, this can be done by enabling Developer Mode
(Settings &gt; Advanced &gt; Show Develop menu in menu bar) and setting the user
agent to an iPhone agent:</p>

<p><img src="/assets/safari-useragents.png" alt="" /></p>

<p>From there, navigate to Delta’s inflight interstitial
(<code class="language-plaintext highlighter-rouge">https://wifi.inflightinternet.com</code>) and use the newly visible T-Mobile
option, going through SMS verification as normal.</p>

<p>I’m <em>pretty</em> sure this will work, assuming you have Wi-Fi Calling enabled
and have iMessage on your Mac linked to your phone.</p>

<h2 id="summary">Summary</h2>

<p>Neither of these tricks is new; they’re both very old!</p>

<p>That being said, I was amused (and a little endeared) by the fact that
I could use them to get online with inflight Wi-Fi in 2023 — I guess
things on the device sniffing/doing-better-than-trusting-MAC front haven’t
improved much over the last decade (or, equally likely, Delta and T-Mobile
just don’t care about the tiny fraction of users who can figure this out).</p>

<p>On the off chance they do care about this and haven’t intentionally left this
gap open for more motivated users, I welcome whatever trick they use to stop
my from doing this. It’ll make for an interesting challenge on my next flight.</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:seemingly" role="doc-endnote">
      <p>Seemingly: I’ve been able to use SSH and SMTP(S) on it, so it’s not just HTTP(S). The bandwidth is restricted, obviously. <a href="#fnref:seemingly" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:private" role="doc-endnote">
      <p>By default iOS uses private MACs for each Wi-Fi network you join, so I’m not too worried about forgetting to reset this later. It should also not persist between reboots, anyways. <a href="#fnref:private" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="howto" /><summary type="html"><![CDATA[This post was written on a plane, using one of the tricks below. Please forgive any small typographical errors; I’ll try and fix them later.]]></summary></entry><entry><title type="html">Introducing shaq, a CLI for Shazam</title><link href="https://blog.yossarian.net/2023/07/27/Introducing-shaq-a-CLI-for-shazam" rel="alternate" type="text/html" title="Introducing shaq, a CLI for Shazam" /><published>2023-07-27T00:00:00+00:00</published><updated>2023-07-27T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/07/27/Introducing-shaq-a-CLI-for-shazam</id><content type="html" xml:base="https://blog.yossarian.net/2023/07/27/Introducing-shaq-a-CLI-for-shazam"><![CDATA[<p>This is another tool announcement post, this time for <a href="https://github.com/woodruffw/shaq"><code class="language-plaintext highlighter-rouge">shaq</code></a>.</p>

<p><img src="/assets/shaq.gif" alt="A demonstration animation of shaq, detecting a song." /></p>

<p>As the demo implies, <code class="language-plaintext highlighter-rouge">shaq</code> does just one thing: it listens to an audio source and sends the results
to <a href="https://www.shazam.com/">Shazam</a> for fingerprinting. If a match is found, it prints it.</p>

<p>As usual, you can install it from <a href="https://pypi.org/project/shaq/">PyPI</a> using <code class="language-plaintext highlighter-rouge">pip</code> or
<a href="https://pypa.github.io/pipx/"><code class="language-plaintext highlighter-rouge">pipx</code></a>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre>pip <span class="nb">install </span>shaq
<span class="c"># or:</span>
pipx <span class="nb">install </span>shaq

shaq <span class="nt">--help</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="usage">Usage</h2>

<p>By default <code class="language-plaintext highlighter-rouge">shaq</code> listens to the system microphone (helpfully supplied through
<a href="http://www.portaudio.com/"><code class="language-plaintext highlighter-rouge">portaudio</code></a>) and writes its findings as plain text, but you can tell
it to detect from an arbitrary input instead:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre><span class="c"># shaq analyzes the first 10 seconds by default</span>
shaq <span class="nt">--input</span> mystery.mp3

<span class="c"># analyze a longer segment</span>
shaq <span class="nt">--input</span> mystery.mp3 <span class="nt">--duration</span> 15

<span class="c"># anything with an audio track that ffmpeg can handle can be an input</span>
shaq <span class="nt">--input</span> another-mystery.mp4
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…as well as to emit JSON:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>shaq <span class="nt">--listen</span> <span class="nt">--duration</span> 5 <span class="nt">--json</span> | jq <span class="s1">'[.track.title, .track.subtitle]'</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>produces:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="p">[</span><span class="w">
  </span><span class="s2">"Mendacity"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"Max Roach"</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></pre></td></tr></tbody></table></code></pre></div></div>

<p>Under the hood, <code class="language-plaintext highlighter-rouge">shaq</code> is a relatively thin wrapper around
<a href="https://pypi.org/project/PyAudio/">PyAudio</a>, <a href="https://pypi.org/project/pydub/">pydub</a>,
and <a href="https://pypi.org/project/shazamio/">shazamio</a>.</p>

<p>For the moment, <code class="language-plaintext highlighter-rouge">shaq</code> only supports fixed durations. Once I get bored again,
I plan to add:</p>

<ul>
  <li>Support for “rolling” detections, i.e. “listen until you get a match”;</li>
  <li>Support for timeranges in file inputs, i.e., “15 seconds starting at 00:12:13”.</li>
</ul>

<h2 id="but-why">But why?</h2>

<p>I am a <a href="https://www.last.fm/user/yossarian_flew">compulsive scrobbler</a>;
I record almost everything I listen to, including phonograph records.</p>

<p>For records, I currently use <a href="https://vinylscrobbler.com/">Vinyl Scrobbler</a>
to scrobble records as I play them and it works very nicely (it even puts scrobbles
in the future so that they end up mostly synchronized with the actual record!).</p>

<p>At the same time, it’s <em>slightly</em> more manual than I’d prefer: I’d like to be able
to put a record on and have scrobbles come through <em>automatically</em> as each track starts. This
is why I made <code class="language-plaintext highlighter-rouge">shaq</code>: my plan is to hook the (unused) headphones channel on my received into
a Raspberry Pi, which will then run <code class="language-plaintext highlighter-rouge">shaq</code> either continuously or on a transition trigger
(i.e., noise drop between tracks).</p>

<p>The Pi will then scrobble based on <code class="language-plaintext highlighter-rouge">shaq</code>’s results,
satisfying my desire to be simultaneously too lazy to click a single button on my computer while
also physically flipping big chunks of vinyl every 20 minutes.</p>]]></content><author><name>William Woodruff</name></author><category term="devblog" /><category term="programming" /><category term="python" /><category term="music" /><summary type="html"><![CDATA[This is another tool announcement post, this time for shaq.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.yossarian.net/assets/shaq.gif" /><media:content medium="image" url="https://blog.yossarian.net/assets/shaq.gif" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Looking for additional maintainers on a few projects</title><link href="https://blog.yossarian.net/2023/07/16/Looking-for-additional-maintainers-on-a-few-projects" rel="alternate" type="text/html" title="Looking for additional maintainers on a few projects" /><published>2023-07-16T00:00:00+00:00</published><updated>2023-07-16T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/07/16/Looking-for-additional-maintainers-on-a-few-projects</id><content type="html" xml:base="https://blog.yossarian.net/2023/07/16/Looking-for-additional-maintainers-on-a-few-projects"><![CDATA[<p>I was recently reminded by an
<a href="https://shkspr.mobi/blog/2023/07/the-pull-request-hack-is-fucking-magic/">excellent post</a>
on Terence Eden’s blog that I have a few projects that could use additional maintainers
besides me. While I do my best to keep them up-to-date and address reports
that come in, I believe that they could benefit from more active maintainership
by others with greater stakes each project’s continued stability and growth.</p>

<p>I’ve summarized each below; if you’re a user or previous contributor to one,
please <a href="https://yossarian.net">get in touch</a> with me! I’m also happy to consider
non-users and entirely new maintainers, although I’d like to understand your
potential interest and/or stake in the project if you aren’t already a user
or contributor.</p>

<h2 id="ruby-mpv">ruby-mpv</h2>

<p><a href="https://github.com/woodruffw/ruby-mpv"><em>ruby-mpv</em></a> is a Ruby library for controlling
<a href="https://mpv.io/">MPV</a> processes through its JSON IPC protocol. I created this library a few years
ago for a side project that has now been abandoned and memory holed, but the library itself is
general purpose and already covers a large chunk of MPV’s IPC API.</p>

<h2 id="qrencodecr">qrencode.cr</h2>

<p><a href="https://github.com/woodruffw/qrencode.cr"><em>qrencode.cr</em></a> is a Crystal library that exposes
high-level bindings for <a href="https://fukuchi.org/works/qrencode/index.html.en">libqrencode</a>.</p>

<h2 id="x_docr">x_do.cr</h2>

<p><a href="https://github.com/woodruffw/x_do.cr"><em>x_do.cr</em></a> is a Crystal library that exposes
high-level bindings for <a href="https://github.com/jordansissel/xdotool">libxdo</a>.</p>

<h2 id="notifycr">notify.cr</h2>

<p><a href="https://github.com/woodruffw/notify.cr"><em>notify.cr</em></a> is a Crystal library that exposes
high-level bindings for the D-Bus
<a href="https://specifications.freedesktop.org/notification-spec/notification-spec-latest.html">Desktop Notifications</a>
service.</p>

<h2 id="libbdiff">libbdiff</h2>

<p><a href="https://github.com/woodruffw/libbdiff"><em>libbdiff</em></a> is a linkable C library version
of Colin Percival’s <a href="https://www.daemonology.net/bsdiff/"><em>bsdiff</em></a>. It hasn’t been maintained in
nearly a decade.</p>

<h2 id="ruby-inih">ruby-inih</h2>

<p><a href="https://github.com/woodruffw/ruby-inih"><em>ruby-inih</em></a> is a native (meaning written in C)
Ruby library that exposes the <a href="https://github.com/benhoyt/inih"><code class="language-plaintext highlighter-rouge">inih</code></a> INI parser via
a Ruby API.</p>

<h2 id="simplesession">SimpleSession</h2>

<p><a href="https://github.com/woodruffw/SimpleSession"><em>SimpleSession</em></a> is a bare-bones session
manager for Sublime Text 3 and 4, intended to be a simpler and more sync-friendly
version of Sublime Text’s built-in “workspace” support. Based on
<a href="https://packagecontrol.io/packages/SimpleSession">Package Control statistics</a>, it probably
has a few dozen users.</p>

<h2 id="others">Others?</h2>

<p>These were just the few that I could remember — there are almost certainly
others. If you’ve contributed to a project of mine that <em>isn’t</em> listed above and
you’d like to help maintain it, please don’t hesitate to get in touch as well!</p>]]></content><author><name>William Woodruff</name></author><category term="devblog" /><category term="programming" /><category term="oss" /><summary type="html"><![CDATA[I was recently reminded by an excellent post on Terence Eden’s blog that I have a few projects that could use additional maintainers besides me. While I do my best to keep them up-to-date and address reports that come in, I believe that they could benefit from more active maintainership by others with greater stakes each project’s continued stability and growth.]]></summary></entry><entry><title type="html">Software-defined (Internet) radio with Liquidsoap</title><link href="https://blog.yossarian.net/2023/06/27/Software-defined-Internet-radio-with-Liquidsoap" rel="alternate" type="text/html" title="Software-defined (Internet) radio with Liquidsoap" /><published>2023-06-27T00:00:00+00:00</published><updated>2023-06-27T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/06/27/Software-defined-Internet-radio-with-Liquidsoap</id><content type="html" xml:base="https://blog.yossarian.net/2023/06/27/Software-defined-Internet-radio-with-Liquidsoap"><![CDATA[<p>This is going to be another short “how-to” blog post on music management, this time on
declarative Internet radio streaming with
<a href="https://github.com/savonet/liquidsoap">Liquidsoap</a>. I couldn’t find a ton
of great examples of Liquidsoap online while defining my own radio stream
(besides the project’s own <a href="https://www.liquidsoap.info/doc-dev/">excellent docs</a>),
so I figured I’d write one.</p>

<h2 id="background">Background</h2>

<p>I have a friend who runs an Internet radio server; I do a (mostly)
<a href="https://yossarian.net/junk/badradio/">weekly show</a> on it.</p>

<p>I’ve done Internet radio before (and ran my own personal server for a few
years), and wasn’t particularly happy with the tooling or broadcasting
flows: most flows were either extremely inflexible (fixed playlists) or brittle
(local loopbacks that sink to a tool like
<a href="https://danielnoethen.de/butt/"><code class="language-plaintext highlighter-rouge">butt</code></a>, requiring error-prone fiddling
with my local sound settings).</p>

<p>In contrast to both of those, I wanted something that could:</p>

<ul>
  <li>Handle both a “primary” playlist and live microphone break-ins
seamlessly, without requiring me to perform a hacky loopback with PulseAudio
or ALSA;</li>
  <li>Integrate into a larger station management workflow, including automatically
fetching (and streaming) the right playlist from my
<a href="https://www.navidrome.org/">Navidrome</a> instance;</li>
  <li>Be <em>maintained</em> as a program, rather than a collection of semi-reproducible
pieces of global system state tied together with shell scripts.</li>
</ul>

<p>These properties brought me to <a href="https://www.liquidsoap.info/doc-dev/">Liquidsoap</a><sup id="fnref:back" role="doc-noteref"><a href="#fn:back" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="liquidsoap-a-swiss-army-knife-for-internet-radio">Liquidsoap: a Swiss Army knife for Internet radio</h2>

<p>Liquidsoap is, among other things, an <em>entire programming language</em> dedicated
to describing and composing audio streams<sup id="fnref:video" role="doc-noteref"><a href="#fn:video" class="footnote" rel="footnote">2</a></sup>.</p>

<p>Conceptually, Liquidsoap turns <em>inputs</em> (a streaming playlist, live
microphone input, periodic jingles and announcements, &amp;c) into a
<em>stream generator</em>, which is then sunk into <em>outputs</em> (the local audio
output, an MP3 backup, an <a href="https://icecast.org/">Icecast</a> server, &amp;c).</p>

<p>This is all done with strongly typed <a href="https://en.wikipedia.org/wiki/ML_(programming_language)">ML</a>-ish<sup id="fnref:ml" role="doc-noteref"><a href="#fn:ml" class="footnote" rel="footnote">3</a></sup>
scripts, like the following:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre><span class="n">source</span> <span class="o">=</span> <span class="nf">mksafe</span><span class="p">(</span><span class="nf">playlist</span><span class="p">(</span><span class="sh">"</span><span class="s">radio.m3u</span><span class="sh">"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="sh">"</span><span class="s">normal</span><span class="sh">"</span><span class="p">))</span>

<span class="n">output</span><span class="p">.</span><span class="nf">icecast</span><span class="p">(</span>
  <span class="o">%</span><span class="nf">mp3</span><span class="p">(</span><span class="n">bitrate</span><span class="o">=</span><span class="mi">320</span><span class="p">,</span> <span class="n">samplerate</span><span class="o">=</span><span class="mi">44100</span><span class="p">,</span> <span class="n">stereo</span><span class="o">=</span><span class="n">true</span><span class="p">),</span>
  <span class="n">mount</span><span class="o">=</span><span class="sh">"</span><span class="s">/stream</span><span class="sh">"</span><span class="p">,</span>
  <span class="n">host</span><span class="o">=</span><span class="sh">"</span><span class="s">server.example.com</span><span class="sh">"</span><span class="p">,</span>
  <span class="n">port</span><span class="o">=</span><span class="mi">8000</span><span class="p">,</span>
  <span class="n">password</span><span class="o">=</span><span class="sh">"</span><span class="s">hunter2</span><span class="sh">"</span><span class="p">,</span>
  <span class="n">description</span><span class="o">=</span><span class="sh">"</span><span class="s">my first radio stream</span><span class="sh">"</span><span class="p">,</span>
  <span class="n">source</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The above is a very basic example: all it does it take an
<a href="https://en.wikipedia.org/wiki/M3U">M3U</a>-formatted
playlist and stream it to an Icecast server.</p>

<p>One of Liquidsoap’s key niceties is <em>infallibility</em>: the language won’t
let you define a stream generator that will fail to provide input when
the sink needs it. That’s what the <code class="language-plaintext highlighter-rouge">mksafe</code> function in the example above
does: it converts a <em>fallible</em> source (a playlist) into an <em>infallible</em>
one by ensuring that radio silence is streamed if the underlying
playlist is malformed or can’t be streamed in time.</p>

<p>We can demonstrate by trying to remove the <code class="language-plaintext highlighter-rouge">mksafe</code> and observing the
typechecking error Liquidsoap gives us:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="rouge-code"><pre><span class="c"># this version may be pretty old;</span>
<span class="c"># see https://www.liquidsoap.info/doc-dev/install.html</span>
<span class="nb">sudo </span>apt <span class="nb">install</span> <span class="nt">-y</span> liquidsoap

liquidsoap <span class="s1">'source = playlist("radio.m3u", mode="normal")

output.icecast(
  %mp3(bitrate=320, samplerate=44100, stereo=true),
  mount="/stream",
  host="server.example.com",
  port=8000,
  password="hunter2",
  description="my first radio stream",
  source)'</span>

At line 0, char 9-45:

Error 7: Invalid value:
That <span class="nb">source </span>is fallible
</pre></td></tr></tbody></table></code></pre></div></div>

<p>With these building blocks (fallible and infallible sources, stream generators,
and outputs) we can begin to do more complex things, like defining
crossfades between tracks:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre><span class="n">source</span> <span class="o">=</span> <span class="nf">playlist</span><span class="p">(</span><span class="sh">"</span><span class="s">radio.m3u</span><span class="sh">"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="sh">"</span><span class="s">normal</span><span class="sh">"</span><span class="p">)</span>
<span class="n">source</span> <span class="o">=</span> <span class="nf">crossfade</span><span class="p">(</span><span class="n">duration</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">source</span><span class="p">)</span>
<span class="n">source</span> <span class="o">=</span> <span class="nf">mksafe</span><span class="p">(</span><span class="n">source</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…or adding an additional input source (a microphone), and mixing it
in with a fade in/out:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="rouge-code"><pre><span class="c1"># music source: playlist
</span><span class="n">music</span> <span class="o">=</span> <span class="nf">mksafe</span><span class="p">(</span><span class="nf">playlist</span><span class="p">(</span><span class="sh">"</span><span class="s">radio.m3u</span><span class="sh">"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="sh">"</span><span class="s">normal</span><span class="sh">"</span><span class="p">))</span>

<span class="c1"># microphone source: amplify to 130% and strip blanks &gt; 0.75 secs
</span><span class="n">mic</span> <span class="o">=</span> <span class="nb">input</span><span class="p">.</span><span class="nf">alsa</span><span class="p">(</span><span class="n">bufferize</span><span class="o">=</span><span class="n">false</span><span class="p">)</span>
<span class="n">mic</span> <span class="o">=</span> <span class="nf">amplify</span><span class="p">(</span><span class="mf">1.3</span><span class="p">,</span> <span class="n">mic</span><span class="p">)</span>
<span class="n">mic</span> <span class="o">=</span> <span class="n">blank</span><span class="p">.</span><span class="nf">strip</span><span class="p">(</span><span class="n">max_blank</span><span class="o">=</span><span class="mf">0.75</span><span class="p">,</span> <span class="n">mic</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
  <span class="nf">add</span><span class="p">(</span><span class="n">normalize</span><span class="o">=</span><span class="n">false</span><span class="p">,</span> <span class="p">[</span><span class="n">fade</span><span class="p">.</span><span class="nf">out</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">fade</span><span class="p">.</span><span class="ow">in</span><span class="p">(</span><span class="n">b</span><span class="p">)])</span>
<span class="n">end</span>

<span class="c1"># fade in and out of the mic/music sources
</span><span class="n">mix</span> <span class="o">=</span> <span class="nf">fallback</span><span class="p">(</span>
  <span class="n">track_sensitive</span><span class="o">=</span><span class="n">false</span><span class="p">,</span>
  <span class="n">transition_length</span><span class="o">=</span><span class="mf">0.75</span><span class="p">,</span>
  <span class="n">transitions</span><span class="o">=</span><span class="p">[</span><span class="n">f</span><span class="p">,</span> <span class="n">f</span><span class="p">],</span>
  <span class="p">[</span><span class="n">mic</span><span class="p">,</span> <span class="n">music</span><span class="p">])</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Liquidsoap also has a mature callback system, allowing us to define
behaviors that should occur e.g. whenever the stream’s metadata changes:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre><span class="c1"># scrobble the stream's metadata whenever it changes (i.e., when a
# new song begins)
</span><span class="n">music</span><span class="p">.</span><span class="nf">on_metadata</span><span class="p">(</span><span class="nf">fun</span><span class="p">(</span><span class="n">m</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">lastfm</span><span class="p">.</span><span class="nf">submit</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="sh">"</span><span class="s">***</span><span class="sh">"</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="sh">"</span><span class="s">***</span><span class="sh">"</span><span class="p">,</span> <span class="n">m</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>These examples are just the tip of the iceberg; the
<a href="https://www.liquidsoap.info/doc-dev/reference.html">Core API</a> and
<a href="https://www.liquidsoap.info/doc-dev/reference-extras.html">Extra API</a> documentation
contains far more, including all kinds of neat signals processing and advanced
filtering functionality that I haven’t needed (yet).</p>

<h2 id="typing-it-all-together">Typing it all together</h2>

<p>Here is what my (admittedly hacky) Liquidsoap-defined Internet radio looks like:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
</pre></td><td class="rouge-code"><pre><span class="c1">#!/usr/bin/liquidsoap -v
</span>
<span class="n">log</span><span class="p">.</span><span class="n">stdout</span><span class="p">.</span><span class="nf">set</span><span class="p">(</span><span class="n">true</span><span class="p">)</span>

<span class="n">test</span> <span class="o">=</span> <span class="nf">getenv</span><span class="p">(</span><span class="sh">"</span><span class="s">TEST</span><span class="sh">"</span><span class="p">)</span> <span class="o">!=</span> <span class="sh">""</span>
<span class="n">mount</span> <span class="o">=</span> <span class="k">if</span> <span class="n">test</span> <span class="n">then</span> <span class="sh">"</span><span class="s">/test</span><span class="sh">"</span> <span class="k">else</span> <span class="sh">"</span><span class="s">/stream</span><span class="sh">"</span> <span class="n">end</span>
<span class="n">date</span> <span class="o">=</span> <span class="nf">argv</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">time</span><span class="p">.</span><span class="nf">string</span><span class="p">(</span><span class="sh">"</span><span class="s">%Y-%m-%d</span><span class="sh">"</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>

<span class="c1"># Do your own thing here.
</span><span class="n">radio_password</span> <span class="o">=</span> <span class="n">process</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="sh">"</span><span class="s">kbs2 pass badradio.biz</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># Do your own thing here.
</span><span class="n">lastfm_username</span> <span class="o">=</span> <span class="sh">"</span><span class="s">yossarian_flew</span><span class="sh">"</span>
<span class="n">lastfm_password</span> <span class="o">=</span> <span class="n">process</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="sh">"</span><span class="s">kbs2 pass last.fm</span><span class="sh">"</span><span class="p">)</span>

<span class="c1"># retrieve the playlist
</span><span class="n">playlist_name</span> <span class="o">=</span> <span class="n">date</span> <span class="o">^</span> <span class="sh">"</span><span class="s">.m3u</span><span class="sh">"</span>
<span class="n">playlist_path</span> <span class="o">=</span> <span class="sh">"</span><span class="s">/tmp/</span><span class="sh">"</span> <span class="o">^</span> <span class="n">playlist_name</span>

<span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">writing temp playlist to: </span><span class="sh">"</span> <span class="o">^</span> <span class="n">playlist_path</span><span class="p">)</span>

<span class="k">if</span> <span class="n">process</span><span class="p">.</span><span class="nf">test</span><span class="p">(</span><span class="sh">"</span><span class="s">ruby gimme-playlist </span><span class="sh">"</span> <span class="o">^</span> <span class="n">date</span> <span class="o">^</span> <span class="sh">"</span><span class="s"> &gt; </span><span class="sh">"</span> <span class="o">^</span> <span class="n">playlist_path</span><span class="p">)</span> <span class="n">then</span>
  <span class="n">log</span><span class="p">.</span><span class="nf">important</span><span class="p">(</span><span class="sh">"</span><span class="s">gimme-playlist: saved to </span><span class="sh">"</span> <span class="o">^</span> <span class="n">playlist_path</span><span class="p">)</span>

  <span class="k">def</span> <span class="nf">cleanup</span><span class="p">()</span>
    <span class="nb">file</span><span class="p">.</span><span class="nf">remove</span><span class="p">(</span><span class="n">playlist_path</span><span class="p">)</span>
  <span class="n">end</span>

  <span class="nf">on_shutdown</span><span class="p">(</span><span class="n">cleanup</span><span class="p">)</span>
<span class="k">else</span>
  <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">gimme-playlist failed for </span><span class="sh">"</span> <span class="o">^</span> <span class="n">date</span><span class="p">)</span>
  <span class="nf">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">end</span>

<span class="n">music</span> <span class="o">=</span> <span class="nf">mksafe</span><span class="p">(</span><span class="nf">crossfade</span><span class="p">(</span><span class="n">duration</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="nf">playlist</span><span class="p">(</span><span class="n">playlist_path</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="sh">"</span><span class="s">normal</span><span class="sh">"</span><span class="p">)))</span>

<span class="c1"># TODO: Use lastfm.submit.full instead, once it works with crossfades.
# See: https://github.com/savonet/liquidsoap/issues/3172
# music = lastfm.submit.full(user=lastfm_username, password=lastfm_password, music)
</span><span class="n">music</span><span class="p">.</span><span class="nf">on_metadata</span><span class="p">(</span><span class="nf">fun</span><span class="p">(</span><span class="n">m</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">lastfm</span><span class="p">.</span><span class="nf">submit</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="n">lastfm_username</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="n">lastfm_password</span><span class="p">,</span> <span class="n">m</span><span class="p">))</span>

<span class="c1"># microphone source: amplify to 130% and strip blanks &gt; 0.75 secs
</span><span class="n">mic</span> <span class="o">=</span> <span class="nb">input</span><span class="p">.</span><span class="nf">alsa</span><span class="p">(</span><span class="n">bufferize</span><span class="o">=</span><span class="n">false</span><span class="p">)</span>
<span class="n">mic</span> <span class="o">=</span> <span class="nf">amplify</span><span class="p">(</span><span class="mf">1.3</span><span class="p">,</span> <span class="n">mic</span><span class="p">)</span>
<span class="n">mic</span> <span class="o">=</span> <span class="n">blank</span><span class="p">.</span><span class="nf">strip</span><span class="p">(</span><span class="n">max_blank</span><span class="o">=</span><span class="mf">0.75</span><span class="p">,</span> <span class="n">mic</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
  <span class="nf">add</span><span class="p">(</span><span class="n">normalize</span><span class="o">=</span><span class="n">false</span><span class="p">,</span> <span class="p">[</span><span class="n">fade</span><span class="p">.</span><span class="nf">out</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">fade</span><span class="p">.</span><span class="ow">in</span><span class="p">(</span><span class="n">b</span><span class="p">)])</span>
<span class="n">end</span>

<span class="c1"># fade in and out of the mic/music sources
</span><span class="n">mix</span> <span class="o">=</span> <span class="nf">fallback</span><span class="p">(</span>
  <span class="n">track_sensitive</span><span class="o">=</span><span class="n">false</span><span class="p">,</span>
  <span class="n">transition_length</span><span class="o">=</span><span class="mf">0.75</span><span class="p">,</span>
  <span class="n">transitions</span><span class="o">=</span><span class="p">[</span><span class="n">f</span><span class="p">,</span> <span class="n">f</span><span class="p">],</span>
  <span class="p">[</span><span class="n">mic</span><span class="p">,</span> <span class="n">music</span><span class="p">])</span>

<span class="n">output</span><span class="p">.</span><span class="nf">alsa</span><span class="p">(</span><span class="n">bufferize</span><span class="o">=</span><span class="n">false</span><span class="p">,</span> <span class="n">mix</span><span class="p">)</span>

<span class="n">output</span><span class="p">.</span><span class="nf">icecast</span><span class="p">(</span>
  <span class="o">%</span><span class="nf">mp3</span><span class="p">(</span><span class="n">bitrate</span><span class="o">=</span><span class="mi">320</span><span class="p">,</span> <span class="n">samplerate</span><span class="o">=</span><span class="mi">44100</span><span class="p">,</span> <span class="n">stereo</span><span class="o">=</span><span class="n">true</span><span class="p">),</span>
  <span class="n">mount</span><span class="o">=</span><span class="n">mount</span><span class="p">,</span>
  <span class="n">host</span><span class="o">=</span><span class="sh">"</span><span class="s">server.badradio.biz</span><span class="sh">"</span><span class="p">,</span>
  <span class="n">port</span><span class="o">=</span><span class="mi">8000</span><span class="p">,</span>
  <span class="n">password</span><span class="o">=</span><span class="n">radio_password</span><span class="p">,</span>
  <span class="n">url</span><span class="o">=</span><span class="sh">"</span><span class="s">https://yossarian.net/junk/badradio/</span><span class="sh">"</span> <span class="o">^</span> <span class="n">date</span><span class="p">,</span>
  <span class="n">description</span><span class="o">=</span><span class="sh">"</span><span class="s">the rummage bin</span><span class="sh">"</span><span class="p">,</span>
  <span class="n">mix</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Starting the stream is as simple as:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="c"># `radio.liq` is the source file</span>
liquidsoap radio.liq <span class="nt">--</span> <span class="s2">"2023-06-26"</span>

<span class="c"># stream to the test endpoint instead</span>
<span class="nv">TEST</span><span class="o">=</span>1 liquidsoap radio.liq <span class="nt">--</span> <span class="s2">"2023-06-26"</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This does a few things automatically for me:</p>

<ul>
  <li>It handles both test and showtime streaming; I set <code class="language-plaintext highlighter-rouge">TEST=1</code> while invoking
the script to switch to a testing endpoint.</li>
  <li>It keeps my credentials out of the program: the radio’s definition
uses <a href="https://github.com/woodruffw/kbs2"><code class="language-plaintext highlighter-rouge">kbs2</code></a> (my password manager)
to retrieve the appropriate credentials as required.</li>
  <li>It automatically sets up the right playlist: I pass in the show’s date,
and the radio’s definition shells out to a little helper (<code class="language-plaintext highlighter-rouge">gimme-playlist</code>)
that talks to my Navidrome instance and generates an M3U-formatted playlist.
It also ensures that the playlist is cleaned up automatically, via
the <code class="language-plaintext highlighter-rouge">on_shutdown</code> hook.</li>
  <li>It makes source handling “intuitive”: I don’t have to configure anything
ahead-of-time, and switching to my microphone is as simple as unmuting myself;
the stream gracefully handles the transition with a fade.</li>
</ul>

<p>All told, Liquidsoap has made a normally stressful and unappealing task
into a genuinely <em>fun</em> one: it’s allowed me to spend less time thinking
about ways in which my stream can break catastrophically, and more time
actually planning my shows. It’s also given me plenty of ideas for future
improvements, including:</p>

<ul>
  <li>Further automating my pre- and post-show operations, like updating
the show’s website.</li>
  <li>Attempting some kind of call-in and/or shoutout functionality, using
the server’s chat.</li>
</ul>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:back" role="doc-endnote">
      <p>More accurately, they brought me <em>back</em> to Liquidsoap: I had discovered it years ago, but had completely forgotten about it except as “interesting looking stream management tool.” I had to search for it online to find it again, which was surprisingly hard to do. <a href="#fnref:back" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:video" role="doc-endnote">
      <p>It can apparently also do <a href="https://www.liquidsoap.info/doc-dev/video.html">video</a>, but I haven’t tried that. <a href="#fnref:video" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:ml" role="doc-endnote">
      <p>Liquidsoap itself is written in OCaml; the Liquidsoap language is (to my eyes) somewhere between OCaml and Python. <a href="#fnref:ml" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="howto" /><category term="music" /><summary type="html"><![CDATA[This is going to be another short “how-to” blog post on music management, this time on declarative Internet radio streaming with Liquidsoap. I couldn’t find a ton of great examples of Liquidsoap online while defining my own radio stream (besides the project’s own excellent docs), so I figured I’d write one.]]></summary></entry><entry><title type="html">PGP signatures on PyPI: worse than useless</title><link href="https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI-worse-than-useless" rel="alternate" type="text/html" title="PGP signatures on PyPI: worse than useless" /><published>2023-05-21T00:00:00+00:00</published><updated>2023-05-21T00:00:00+00:00</updated><id>https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI-worse-than-useless</id><content type="html" xml:base="https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI-worse-than-useless"><![CDATA[<p>TL;DR: A large number of PGP signatures on PyPI <strong>can’t be correlated</strong> to any well-known
PGP key and, of the signatures that can be correlated, many are generated from <strong>weak keys or
malformed certificates</strong>. The results suggest widespread misuse of GPG and other PGP implementations by Python
packagers, with said misuse being encouraged by the PGP ecosystem’s poor defaults, opaque
and user-hostile interfaces, and
<a href="https://gnupg.org/faq/gnupg-faq.html#define_dsa">outright dangerous recommendations</a>.</p>

<h3 id="preword">Preword</h3>

<p>I’ve been sitting on this post for a few months, in part because of travel
and in part because its (intended) scope was beginning to reflect PGP’s own fractal complexity.</p>

<p>The version that I’m publishing now has been <strong>significantly pared down</strong> to remove extended
digressions on how bad PGP’s packet format is, all the different ways in which a signature or
certificate packet can be broken, incorrectly bound, &amp;c.</p>

<p>I’ve removed those things because I think the results, as present, are <strong>sufficient evidence</strong>
for the actual claims I’d like to make, namely:</p>

<ol>
  <li>
    <p>That existing PGP signatures on PyPI serve no security purpose, and that all evidence
points to <strong>nobody ever attempting to verify them</strong>;</p>
  </li>
  <li>
    <p>Even advanced technical communities, as a whole, <strong>largely fail to reduce PGP’s complexity
and unnecessary agility</strong> into a reasonable and tractable subset.</p>
  </li>
</ol>

<p>And, just in case it needs to be said:</p>

<ol>
  <li>
    <p>This post isn’t intended to disparage PyPI: PyPI has <strong>done everything right</strong>, including
<a href="https://github.com/pypi/warehouse/issues/3356">purposely removing frontend support for PGP years ago</a>.</p>
  </li>
  <li>
    <p>This post isn’t intended to disparage individual packagers and maintainers still uploading
signatures to PyPI. I suspect that much of the ongoing signature uploading is a result
of long-forgotten automation and, even when it isn’t: developers <strong>cannot</strong> be blamed for
their misuse of obtuse tools. Security tools, <em>especially</em> cryptographic ones, are
<strong>only as good as their least-informed<sup id="fnref:domain" role="doc-noteref"><a href="#fn:domain" class="footnote" rel="footnote">1</a></sup> and most distracted user</strong>.</p>
  </li>
</ol>

<hr />

<h2 id="background">Background</h2>

<p>PyPI has supported PGP signatures in some form or another for a very long time<sup id="fnref:time" role="doc-noteref"><a href="#fn:time" class="footnote" rel="footnote">2</a></sup>.</p>

<p>To this date, PGP is still (minimally) supported: package uploaders can still sign for their package
distributions and upload the resulting <code class="language-plaintext highlighter-rouge">.asc</code> to PyPI for inclusion in the index. The
<a href="https://twine.readthedocs.io/en/stable/">official uploading utility</a> even supports invoking
<code class="language-plaintext highlighter-rouge">gpg</code> directly via the <code class="language-plaintext highlighter-rouge">--sign</code> and <code class="language-plaintext highlighter-rouge">--sign-with</code> arguments!</p>

<p>To a novice Python programmer looking to publish their first package to PyPI, this might give the
following impressions:</p>

<ol>
  <li>That PGP offers secure and modern cryptographic primtives;</li>
  <li>That PyPI <em>encourages</em> users to upload PGP signatures or that doing so is <em>best practice</em>;</li>
  <li>That <em>others</em> expect PGP signatures, and that package adoption is (in part) predicated
on supplying PGP signatures.</li>
</ol>

<p>The first two are <em>just wrong</em>:</p>

<ol>
  <li>
    <p>PGP is an <a href="https://latacora.micro.blog/2019/07/16/the-pgp-problem.html">insecure</a> and
<a href="https://moxie.org/2015/02/24/gpg-and-me.html">outdated</a> ecosystem that hasn’t reflected
cryptographic best practices
<a href="https://blog.cryptographyengineering.com/2014/08/13/whats-matter-with-pgp/">in decades</a>.</p>
  </li>
  <li>
    <p>PyPI’s support is vestigial in nature: signatures are not shown as part of the web interface,
and are only obliquely referenced in the <a href="https://peps.python.org/pep-0503/">PEP 503</a> and JSON
APIs.</p>
  </li>
</ol>

<p>The third is harder to immediately refute: PyPI still hosts signatures, after all. Absent any
other information, it’s <em>entirely possible</em> that companies and end users are quietly and diligently
verifying whatever signatures are present, using trust sets, tracking revoked and expired keys,
and so forth.</p>

<p>Thus, my goal with this blog post:</p>

<ol>
  <li>Determine <em>how many</em> signatures are on PyPI;</li>
  <li>Correlate those signatures to their signing keys;</li>
  <li>Analyze those signing keys for their practical value: their strength, liveness, &amp;c.</li>
</ol>

<h2 id="methodology">Methodology</h2>

<p>Relatively early in the process I decided not to collect <em>every single</em> signature on PyPI,
for two main reasons:</p>

<ol>
  <li>
    <p>Relevance: PyPI hosts many old package distributions, including distributions
for Python 2.7 (and earlier!). Given that Python 2 has been EOL for over three years at
this point, it didn’t feel relevant (or efficient) to retrieve large quantities of
signatures that nobody is likely to ever try install the distributions for.</p>
  </li>
  <li>
    <p>Fairness: both PGP and Python have a lot of history, much of which predates
modern understandings around cryptographic best practices.
Given that, it didn’t feel <em>fair</em> to analyze extremely old
signatures, especially if doing so would bias the statistics away from newer users
who are doing more responsible things.</p>
  </li>
</ol>

<p>Given these considerations, I decided to limit my analysis to <em>only signatures uploaded to PyPI
on or after <strong>2020-03-27</strong></em>. I chose that date somewhat arbitrarily<sup id="fnref:arbitrary" role="doc-noteref"><a href="#fn:arbitrary" class="footnote" rel="footnote">3</a></sup> while
also satisfying a few constraints:</p>

<ul>
  <li>
    <p>It’s well after the 2018 deployment of the <a href="https://github.com/pypi/warehouse">new PyPI</a>,
which didn’t emphasize support for PGP signatures (while still retaining it). In other words:
signatures uploaded in 2020 or later were either done by automation (implying some degree
of sophistication) <em>or</em> were likely a conscious decision by a packager to continue signing
with PGP.</p>
  </li>
  <li>
    <p>It’s <em>very</em> recent, and best practices around digital signatures have not changed
substantially since 2020. In other words: a best-practices signature (and key) made in 2020
should look very similar to a best-practices signature (and key) made in 2023, and someone
signing in 2020 would have no good excuses for not making reasonable choices.</p>
  </li>
</ul>

<p>Actually retrieving the signatures was a multi-step process. To start, I used
<a href="https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/">PyPI’s BigQuery dataset</a>
to give me some basic metadata on every distribution file with an associated signature:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="k">version</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">python_version</span><span class="p">,</span> <span class="n">blake2_256_digest</span>
<span class="k">FROM</span> <span class="nv">`bigquery-public-data.pypi.distribution_metadata`</span>
<span class="k">WHERE</span> <span class="n">has_signature</span>
<span class="k">AND</span> <span class="n">upload_time</span> <span class="o">&gt;</span> <span class="nb">TIMESTAMP</span><span class="p">(</span><span class="nv">"2020-03-27 00:00:00"</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This produced 52900 distributions uploaded since 2020-03-27 for which PyPI also
had a signature (subtract 1 for the CSV header):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="nv">$ </span><span class="nb">wc</span> <span class="nt">-l</span> inputs/dists-with-signatures.csv
52901 inputs/dists-with-signatures.csv

<span class="nv">$ </span><span class="nb">head</span> <span class="nt">-2</span> inputs/dists-with-signatures.csv
name,version,filename,python_version,blake2_256_digest
pantsbuild.pants.testutil,1.30.0,pantsbuild.pants.testutil-1.30.0-py36.py37.py38-none-any.whl,py36.py37.py38,7ecbe47906ddbe8a2f1ee2505c2edb7f9313348d4925855e429be1d316660a00
</pre></td></tr></tbody></table></code></pre></div></div>

<p>From here, I needed to retrieve each release distribution’s detached signature, i.e.
the adjacent <code class="language-plaintext highlighter-rouge">.asc</code> URL in PyPI’s object storage.</p>

<p>I initially did this with the “conveyor” service, which turns
<a href="https://peps.python.org/pep-0491/#file-name-convention">PEP 491</a> names into URLs like so:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>https://files.pythonhosted.org/packages/source/{version}/{name[0]}/{name}/{dist}.asc
</pre></td></tr></tbody></table></code></pre></div></div>

<p>However, this was pretty lossy: for whatever reason<sup id="fnref:version" role="doc-noteref"><a href="#fn:version" class="footnote" rel="footnote">4</a></sup> my URLs were slightly off about 20% of the
time, resulting in lots of missed signatures. I eventually realized that the BigQuery dataset
<em>also</em> includes the Blake2 digest for each distribution, meaning that I could use the <em>actual</em>
package URLs instead:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>https://files.pythonhosted.org/packages/{digest[0:2]}/{digest[2:4]}/{digest[4:]}/{dist}.asc
</pre></td></tr></tbody></table></code></pre></div></div>

<p>…and this was perfectly reliable.</p>

<p>From here, I wanted to figure out (roughly) how many unique keys produced these ~50k signatures.
I decided to use PGPy<sup id="fnref:mistake" role="doc-noteref"><a href="#fn:mistake" class="footnote" rel="footnote">5</a></sup> for that; excerpted from <a href="https://github.com/woodruffw/pypi-pgp-statistics/blob/main/dists-by-keyid.py"><code class="language-plaintext highlighter-rouge">dists-by-keyid.py</code></a>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="n">sig</span> <span class="o">=</span> <span class="n">pgpy</span><span class="p">.</span><span class="n">PGPSignature</span><span class="p">.</span><span class="nf">from_blob</span><span class="p">(</span><span class="n">sig_resp</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
    <span class="c1"># https://github.com/SecurityInnovation/PGPy/issues/433
</span>    <span class="n">sig</span>
    <span class="n">sig</span><span class="p">.</span><span class="n">signer</span>
<span class="k">except</span> <span class="nb">AttributeError</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">barf: couldn</span><span class="sh">'</span><span class="s">t get signer, probably ancient</span><span class="sh">"</span><span class="p">,</span> <span class="nb">file</span><span class="o">=</span><span class="n">sys</span><span class="p">.</span><span class="n">stderr</span><span class="p">)</span>
    <span class="n">_KEY_ID_MAP</span><span class="p">[</span><span class="sh">"</span><span class="s">&lt;invalid signer&gt;</span><span class="sh">"</span><span class="p">].</span><span class="nf">append</span><span class="p">(</span><span class="n">rec</span><span class="p">)</span>
    <span class="k">continue</span>

<span class="n">_KEY_ID_MAP</span><span class="p">[</span><span class="n">sig</span><span class="p">.</span><span class="n">signer</span><span class="p">].</span><span class="nf">append</span><span class="p">(</span><span class="n">rec</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This left me with a big map of PGP key IDs<sup id="fnref:keyid" role="doc-noteref"><a href="#fn:keyid" class="footnote" rel="footnote">6</a></sup> to a list of distributions
signed by them, including 26 distributions whose signatures PGPy couldn’t parse:</p>

<table>
  <tr>
    
    <th>Package name</th>
    
    <th>Distribution count</th>
    
  </tr>
  
  <tr>
    
    <td>agraph-python</td>
    
    <td>2</td>
    
  </tr>
  
  <tr>
    
    <td>excerpt-html</td>
    
    <td>4</td>
    
  </tr>
  
  <tr>
    
    <td>lektor-index-pages</td>
    
    <td>6</td>
    
  </tr>
  
  <tr>
    
    <td>lektor-expression-type</td>
    
    <td>2</td>
    
  </tr>
  
  <tr>
    
    <td>lektor-git-timestamp</td>
    
    <td>2</td>
    
  </tr>
  
  <tr>
    
    <td>lektor-datetime-helpers</td>
    
    <td>3</td>
    
  </tr>
  
  <tr>
    
    <td>lektor-limit-dependencies</td>
    
    <td>2</td>
    
  </tr>
  
  <tr>
    
    <td>lektorlib</td>
    
    <td>2</td>
    
  </tr>
  
  <tr>
    
    <td>lektor-polymorphic-type</td>
    
    <td>3</td>
    
  </tr>
  
</table>

<p>This is a tiny failure (26 distributions out of 52900, or roughly 0.5%), but it
sets the tone for the rest of the post.</p>

<p>Apart from these 26 failures, the remaining 52874 signatures were produced from
1067 “unique”<sup id="fnref:unique" role="doc-noteref"><a href="#fn:unique" class="footnote" rel="footnote">7</a></sup> PGP keys.</p>

<h2 id="results">Results</h2>

<p>At this point, I had 1067 unique <em>key IDs</em>, each of which needed to be retrieved
from a keyserver.</p>

<p>My expectation was that this wouldn’t be a significant challenge,
despite the <a href="https://gist.github.com/rjhansen/67ab921ffb4084c865b3618d6955275f">widely publicized implosion</a> of the SKS keyserver network back in
2018: there are still a few <a href="https://keys.openpgp.org/">major</a>
<a href="https://keyserver.ubuntu.com/">keyservers</a> running, and package authors
pushing to PyPI <em>should</em> have the presence of mind to upload their keys. Right?</p>

<p><img src="/assets/what-me-worry.jpeg" alt="" />
<em>Pictured: your author immediately before trying to retrieve PGP keys in 2023.</em></p>

<p><strong>Wrong</strong>. Of the 1067 keys IDs collected through signatures on PyPI, a full <strong>308</strong>
(or roughly <strong>29%</strong>) had <em>no publicly discoverable key</em> on the major remaining
keyservers. In other words: roughly 1/3rd of all signatures added to PyPI <em>since 2020</em>
are bound to keys that aren’t discoverable by the PGP ecosystem’s own tooling.
They <em>might</em> exist, hidden on personal domains and documentation pages, but, for
all intents and purposes, these 29% of keys are <strong>useless</strong><sup id="fnref:easy" role="doc-noteref"><a href="#fn:easy" class="footnote" rel="footnote">8</a></sup>.</p>

<p>So, our first graphic of the post: discoverable keys versus undiscoverable ones:</p>

<p><img src="/assets/pgp-disco.png" alt="" />
<em>Pictured: a very normal and healthy signing ecosystem.</em></p>

<p>That left 759 discovered keys to <em>actually</em> audit. To keep things
simple<sup id="fnref:pgpkeydump" role="doc-noteref"><a href="#fn:pgpkeydump" class="footnote" rel="footnote">9</a></sup>, I limited my analysis to just the following considerations:</p>

<ul>
  <li>
    <p>Does the key’s certificate have a binding signature<sup id="fnref:binding" role="doc-noteref"><a href="#fn:binding" class="footnote" rel="footnote">10</a></sup>?</p>
  </li>
  <li>
    <p>What algorithm does the key use?</p>
    <ul>
      <li>Additionally, if it’s a subkey, what algorithm does the primary key use?</li>
      <li>For RSA keys, what parameters does the key use?</li>
    </ul>
  </li>
</ul>

<p>If that seems like a limited analysis, it’s because it is: there are <em>too many
ways</em> to produce a weirdly shaped PGP certificate and/or key packet sequence,
and the existing tooling (things like <a href="(https://github.com/kazu-yamamoto/pgpdump)"><code class="language-plaintext highlighter-rouge">pgpdump</code></a>
and <a href="https://www.mailpile.is/blog/2014-10-07_Some_Thoughts_on_GnuPG.html"><code class="language-plaintext highlighter-rouge">pgp --with-colons</code></a>) weren’t up to the task.</p>

<p>Instead, <a href="/2023/04/14/Introducing-pgpkeydump">I wrote a little tool</a> (<a href="https://github.com/woodruffw/pgpkeydump"><code class="language-plaintext highlighter-rouge">pgpkeydump</code></a>) to give me machine-readable
dumps of PGP keys<sup id="fnref:certs" role="doc-noteref"><a href="#fn:certs" class="footnote" rel="footnote">11</a></sup>, and then wrapped it in a <a href="https://github.com/woodruffw/pypi-pgp-statistics/blob/main/key-audit.py">bulk auditing script</a>
that does some basic statistics on the results.</p>

<p>To summarize the results:</p>

<ul>
  <li>Of the 759 discovered keys, <strong>298</strong> (<strong>39%</strong>) had no binding signature at their specified
creation time. In other words: these keys’ certificates came with no verifiable proof for
an associated identity, expiry, or <em>any</em> of the other basic metadata conceptually associated
with a PGP key, including its intended purpose.</li>
  <li><strong>375</strong> (<strong>49%</strong>) had no binding signature at the time of the audit (2023-05-19), meaning that
any binding signature that was present had already expired. In other words: <strong>half of all
keys used to sign on PyPI since 2020 are already expired</strong>. This strongly suggests that
nobody is attempting to verify signatures from PyPI on any meaningful scale.</li>
</ul>

<p>Then, on the algorithm and parameter sides<sup id="fnref:missing" role="doc-noteref"><a href="#fn:missing" class="footnote" rel="footnote">12</a></sup>:</p>

<p>Primary keys:</p>

<table>
  <tr>
    
    <th>Key type</th>
    
    <th>Count</th>
    
  </tr>
  
  <tr>
    
    <td>RSA-4096</td>
    
    <td>497</td>
    
  </tr>
  
  <tr>
    
    <td>RSA-2048</td>
    
    <td>127</td>
    
  </tr>
  
  <tr>
    
    <td>RSA-3072</td>
    
    <td>45</td>
    
  </tr>
  
  <tr>
    
    <td>DSA-1024</td>
    
    <td>40</td>
    
  </tr>
  
  <tr>
    
    <td>EdDSA</td>
    
    <td>35</td>
    
  </tr>
  
  <tr>
    
    <td>DSA-3072</td>
    
    <td>7</td>
    
  </tr>
  
  <tr>
    
    <td>DSA-2048</td>
    
    <td>4</td>
    
  </tr>
  
  <tr>
    
    <td>NIST P-521</td>
    
    <td>1</td>
    
  </tr>
  
  <tr>
    
    <td>RSA-4064</td>
    
    <td>1</td>
    
  </tr>
  
  <tr>
    
    <td>RSA-4032</td>
    
    <td>1</td>
    
  </tr>
  
</table>

<p>“Effective”<sup id="fnref:effective" role="doc-noteref"><a href="#fn:effective" class="footnote" rel="footnote">13</a></sup> keys:</p>

<table>
  <tr>
    
    <th>RSA-4096</th>
    
    <th>471</th>
    
  </tr>
  
  <tr>
    
    <td>RSA-2048</td>
    
    <td>151</td>
    
  </tr>
  
  <tr>
    
    <td>RSA-3072</td>
    
    <td>47</td>
    
  </tr>
  
  <tr>
    
    <td>EdDSA</td>
    
    <td>43</td>
    
  </tr>
  
  <tr>
    
    <td>DSA-1024</td>
    
    <td>31</td>
    
  </tr>
  
  <tr>
    
    <td>DSA-3072</td>
    
    <td>7</td>
    
  </tr>
  
  <tr>
    
    <td>DSA-2048</td>
    
    <td>5</td>
    
  </tr>
  
  <tr>
    
    <td>NIST P-521</td>
    
    <td>1</td>
    
  </tr>
  
  <tr>
    
    <td>brainpoolP512r1</td>
    
    <td>1</td>
    
  </tr>
  
  <tr>
    
    <td>RSA-4032</td>
    
    <td>1</td>
    
  </tr>
  
</table>

<p>Or again, as pretty charts:</p>

<p><img src="/assets/pgp-primary-keys-by-type.png" alt="" /></p>

<p><img src="/assets/pgp-effective-keys-by-type.png" alt="" /></p>

<p>First, the “good” parts:</p>

<ol>
  <li>While <a href="https://www.youtube.com/watch?v=lElHzac8DDI">normally a bad choice</a>, RSA is <em>literally</em>
the best you can do in terms of standard<sup id="fnref:rfc4880" role="doc-noteref"><a href="#fn:rfc4880" class="footnote" rel="footnote">14</a></sup> asymmetric signing algorithms in PGP. Over
two thirds of keys used to sign on PyPI are using it, and they’re using reasonable<sup id="fnref:rsa-size" role="doc-noteref"><a href="#fn:rsa-size" class="footnote" rel="footnote">15</a></sup>
key sizes (4096 and 3072).</li>
</ol>

<p>Then, the meh:</p>

<ol>
  <li>
    <p>A sizeable minority (20% of effective keys, and 17% of primary keys) are RSA-2048.
NIST considers RSA-2048 to be equivalent to roughly 112 bits of security<sup id="fnref:discouraged" role="doc-noteref"><a href="#fn:discouraged" class="footnote" rel="footnote">16</a></sup>, and
<a href="https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf">does not recommend</a> its use on data that’s expected to have a security life
of 15 years…<strong>starting in 2015</strong>. That means that PyPI-hosted signatures against RSA-2048 keys
have roughly 7 years of “shelf life” in them. Version turnover in packaging ecosystems
has accelerated over the last decade; let’s hope that applies here too!</p>
  </li>
  <li>
    <p>Some enterprising people are on the <a href="https://datatracker.ietf.org/doc/html/draft-ietf-openpgp-rfc4880bis-10">“bleeding edge”</a>: they’re using
EdDSA and a few different ECDSA curves. It’s hard to say whether this is good or bad: it’s
good in the sense that these are <em>almost certainly</em> better than anything offered by
strictly RFC 4880 PGP implementations, but pointless in the sense that support for verifying
these signatures is limited<sup id="fnref:limited" role="doc-noteref"><a href="#fn:limited" class="footnote" rel="footnote">17</a></sup> to just a few clients. It’s also probably
pointlessly slow (for P-521 and brainpoolP512r1 in particular).</p>
  </li>
</ol>

<p>And finally, the insane:</p>

<ul>
  <li>
    <p>Roughly 5% of all keys used to sign for packages on PyPI are DSA. The majority
of those are DSA-1024, which is roughly equivalent in strength to RSA-1024.
<a href="https://buttondown.email/cryptography-dispatches/archive/cryptography-dispatches-dsa-is-past-its-prime/">DSA of any size is already very bad</a>,
and DSA-1024 is <em>well</em> outside of any acceptable safety margin for signatures in
2023, much less 2020 or even 2010.</p>
  </li>
  <li>
    <p>RSA-4064 and RSA-4032. I have no idea why anyone would do this<sup id="fnref:checking" role="doc-noteref"><a href="#fn:checking" class="footnote" rel="footnote">18</a></sup>. Maybe some
misguided attempt to calculate a precise security margin, or a misreading of someone else’s
recommendations?</p>
  </li>
  <li>
    <p>One of the RSA-2048 keys has a public exponent of <code class="language-plaintext highlighter-rouge">41</code>, rather than <code class="language-plaintext highlighter-rouge">65537</code> (which <em>every other
RSA key in the dataset uses</em>). Again, I have no idea why anyone would do this: it’s pointlessly
slower and opens up padding concerns that <code class="language-plaintext highlighter-rouge">e = 65537</code> is resilient against.</p>
  </li>
</ul>

<h2 id="takeaways">Takeaways</h2>

<p>To summarize: of <em>just</em> the PGP signatures uploaded to PyPI in the last three years:</p>

<ul>
  <li>A small handful (26) are so malformed or old that they can’t be correlated to a key ID;</li>
  <li>Of those that can be correlated to a key ID, <strong>nearly a third cannot be discovered</strong>
through the PGP ecosystem’s <em>intended key discovery mechanism</em>;</li>
  <li>
    <p>Of the remaining discoverable keys:</p>

    <ul>
      <li>
        <p>Nearly half (<strong>49%</strong>) have no active binding signature when retrieved from
a keyserver, giving them indefinite (at the absolute best) identity properties.</p>
      </li>
      <li>
        <p>A sizeable minority (<strong>20%</strong>) are using weak RSA keys with less than a decade
before NIST considers them insecure.</p>
      </li>
      <li>
        <p>A smaller but still appreciable minority (<strong>5%</strong>) are using DSA-1024 keys,
which have been considered insecure for well over a decade.</p>
      </li>
    </ul>
  </li>
</ul>

<p>By all rights, these numbers represent the <strong>best possible case</strong> for PGP signatures on
PyPI. Expanding the audit to 2015 or even earlier would likely reveal far worse practices.</p>

<p>In one sense, none of this is a problem: the breadth and depth of issues here
suggests that <strong>nobody (thankfully!) is actually relying on these signatures</strong>,
and the continued presence of new signatures on PyPI is primarily a vestige of
forgotten automation and outdated tutorials.</p>

<p>On the other hand, these results present a strong case against attempting
to “rehabilitate” PGP signatures for PyPI, or any other packaging ecosystem:
all evidence points to end users (i.e., signers) being unable<sup id="fnref:fault" role="doc-noteref"><a href="#fn:fault" class="footnote" rel="footnote">19</a></sup> to distinguish
between the “good” and “bad” parts of PGP, much less <em>use</em> them at all (e.g. keyservers).</p>

<p>So, for final conclusions:</p>

<ul>
  <li>Given how broken the PGP signatures and keys present on PyPI are, it’s <em>unlikely</em> that anybody
is currently doing wide-scale verification against them.</li>
  <li>If anybody is (and I’d be interested to hear if you are!), then it’s almost certainly
<em>inadvisable</em>: “verifying” these signatures is, on average, likely to provide a
<em>false degree</em> of confidence in their value.</li>
</ul>

<p>As with previous posts, I’ve tried to make my steps and data reproducible, and have
checked them all into <a href="https://github.com/woodruffw/pypi-pgp-statistics">this repo</a>. I welcome any discoveries of mistakes I’ve made, as
well as any attempts to improve the overall detail or fidelity of the results!</p>

<hr />

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:domain" role="doc-endnote">
      <p>In a domain-specific sense: nobody should have to be an expert in compilers to enable basic security mitigations, and nobody should have to be an expert in cryptographic protocol design to generate a good signature. <a href="#fnref:domain" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:time" role="doc-endnote">
      <p>It’s hard to tell exactly how long, but it’s potentially as old as PyPI itself: 23 year old <a href="https://mail.python.org/archives/list/distutils-sig@python.org/message/7EWDCZDW5QPPHJQFQJ7L2LXLCQQWEY6O/">design threads</a> mention PGP as an early consideration. <a href="#fnref:time" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:arbitrary" role="doc-endnote">
      <p>It’s exactly three years before before the day I began this post. <a href="#fnref:arbitrary" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:version" role="doc-endnote">
      <p>I was too lazy to debug this, but it was probably because I was assuming that all distribution URLs were wheel-like, when many were source distributions. <strong>Update</strong>: <a href="https://github.com/ewdurbin">Ee</a> has informed me that this was probably because of a lack of normalization: conveyor doesn’t normalize package or version names on either end. <a href="#fnref:version" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:mistake" role="doc-endnote">
      <p>As the snippet suggests, this was probably a mistake: PGPy is <a href="https://github.com/SecurityInnovation/PGPy/issues"><em>very</em> lightly maintained</a> and appears the win the jackpot in terms of simultaneously being incompatible with old PGP signatures <em>and</em> lagging behind the rest of the PGP ecosystem. <a href="#fnref:mistake" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:keyid" role="doc-endnote">
      <p>As in, the 32 byte/8 hexdigit key IDs that everyone is used to. You know, the ones that are <a href="https://lwn.net/Articles/689792/">trivially collidable</a> and have been for years. <a href="#fnref:keyid" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:unique" role="doc-endnote">
      <p>PGP has both keys and “subkeys,” and the relationships between them are pointlessly malleable. Given that, the number is really 1067 unique <em>key IDs</em>; it’s impossible to say how many unique <em>containing certificates</em> or <em>representations</em> of each key have been made over the years. <a href="#fnref:unique" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:easy" role="doc-endnote">
      <p>I’m also giving the PGP ecosystem a break here, by acting as if a key’s presence on a keyserver somehow makes it trustworthy. This isn’t true: you still need to have a reason to trust the key, which schemes like the <a href="https://en.wikipedia.org/wiki/Web_of_trust">web of trust and strong set</a> were meant (and <a href="https://inversegravity.net/2019/web-of-trust-dead/">failed</a>) to provide. <a href="#fnref:easy" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:pgpkeydump" role="doc-endnote">
      <p>Things were originally not simple: I started out by writing a full PGP certificate and key linter, <a href="#fnref:pgpkeydump" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:binding" role="doc-endnote">
      <p>A PGP certificate that doesn’t contain a binding signature is effectively not a certificate, since it contains no positive evidence that someone actually possesses the private half of the key. <a href="#fnref:binding" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:certs" role="doc-endnote">
      <p>Really PGP “certificates” or “sequences of packets resembling PGP certificates,” but nobody uses these terms consistently in the PGP ecosystem. <a href="#fnref:certs" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:missing" role="doc-endnote">
      <p>The eagle eyed might notice that the total key count here is off by one: 758 instead of 759. That’s because there’s one key ID, <code class="language-plaintext highlighter-rouge">CD6F6C3E0A50F73B</code>, that doesn’t even match the key <a href="https://keys.openpgp.org/search?q=CD6F6C3E0A50F73B">returned by the keyserver</a>! I have no clue how this happened, and I can’t be bothered to figure out. <a href="#fnref:missing" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:effective" role="doc-endnote">
      <p>“Effective” means the signing key, which can either be the primary key or a subkey. I audited both (when different), under the operating theory that it’s bad to have a strong subkey bound to a weak primary key (cf. a strong TLS certificate issued by a weak CA). <a href="#fnref:effective" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:rfc4880" role="doc-endnote">
      <p>Meaning <a href="https://datatracker.ietf.org/doc/html/rfc4880">RFC 4880</a> compliant, not the miscellaneous other optional RFCs that various implementations may or may not choose to support. <a href="#fnref:rfc4880" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:rsa-size" role="doc-endnote">
      <p>In terms of cryptographic safety margins, not representation size. Representation wise, both RSA-3072 and RSA-4096 are ridiculously large and unwieldy compared to EC keys with similar or stronger margins. <a href="#fnref:rsa-size" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:discouraged" role="doc-endnote">
      <p>Which itself is discouraged: NIST’s own recommendation is to prefer a <em>minimum</em> of 128 bits of security, which would correspond (roughly) to RSA-3072. <a href="#fnref:discouraged" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:limited" role="doc-endnote">
      <p>And, if your use of PGP involves an incompatible subset, you might as well just do things right and drop PGP entirely. <a href="#fnref:limited" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:checking" role="doc-endnote">
      <p>And I didn’t bother checking. <a href="#fnref:checking" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:fault" role="doc-endnote">
      <p>Which, again, is <strong>not their fault</strong>: the system itself bears complete responsibility. <a href="#fnref:fault" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>William Woodruff</name></author><category term="programming" /><category term="devblog" /><category term="cryptography" /><category term="rant" /><category term="python" /><summary type="html"><![CDATA[TL;DR: A large number of PGP signatures on PyPI can’t be correlated to any well-known PGP key and, of the signatures that can be correlated, many are generated from weak keys or malformed certificates. The results suggest widespread misuse of GPG and other PGP implementations by Python packagers, with said misuse being encouraged by the PGP ecosystem’s poor defaults, opaque and user-hostile interfaces, and outright dangerous recommendations.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.yossarian.net/assets/what-me-worry.jpeg" /><media:content medium="image" url="https://blog.yossarian.net/assets/what-me-worry.jpeg" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>