Programming, philosophy, pedaling.

Weird architectures weren't supported to begin with

Feb 28, 2021     Tags: oss, programming, rant    

This post is at least a year old.


This post contains my own opinions, not the opinions of my employer or any open source groups I belong or contribute to.

It’s also been rewritten 2½ times, and (I think) reads confusingly in places. But I promised myself that I’d get it out of the door instead of continuing to sit on it, so here we go.

There’s been a decent amount of drama debate in the open source community about support recently, originating primarily from pyca/cryptography’s decision to use Rust for some ASN.1 parsing routines1.

To summarize the situation: building the latest pyca/cryptography release from scratch now requires a Rust toolchain. The only current2 Rust toolchain is built on LLVM, which supports a (relatively) limited set of architectures. Rust further whittles this set down into support tiers, with some targets not receiving automated testing (tier 2) or official builds (tier 3).

By contrast, upstream3 GCC supports a somewhat larger set of architectures. But C4, cancer that it is, finds its way onto every architecture with or without GCC (or LLVM’s) help, and thereby bootstraps everything else.

Program packagers and distributors (frequently separate from project maintainers themselves) are very used to C’s universal presence. They’re so used to it that they’ve built generic mechanisms for putting entire distributions onto new architectures with only a single assumption: the presence of a serviceable C compiler.

This is the heart of the conflict: Rust (and many other modern, safe languages) use LLVM for its relative simplicity5, but LLVM does not support either native or cross-compilation to many less popular (read: niche) architectures. Package managers are increasingly finding that one of their oldest assumptions can be easily violated, and they’re not happy about that.

But here’s the problem: it’s a bad assumption. The fact that it’s the default represents an unmitigated security, reliability, and reproducibility disaster.

A little thought problem

Imagine, for a moment, that you’re a maintainer of a popular project.

Everything has gone right for you: you have happy users, an active development base, and maybe even corporate sponsors. You’ve also got a CI/CD pipeline that produces canonical releases of your project on tested architectures; you treat any issues with uses of those releases as a bug in the project itself, since you’ve taken responsibility for packaging it.

Because your project is popular, others also distribute it: Linux distributions, third-party package managers, and corporations seeking to deploy their own controlled builds. These others have slightly different needs and setups and, to varying degrees, will:

You don’t know about any of the above until the bug reports start rolling in: users will report bugs that have already been fixed, bugs that you explicitly document as caused by unsupported configurations, bugs that don’t make any sense whatsoever.

You struggle to debug your users’ reports, since you don’t have access to the niche hardware, environments, or corporate systems that they’re running on. You slowly burn out under an unending deluge of already fixed bugs that never seem to make it to your users. Your user base is unhappy, and you start to wonder why you’re putting all this effort into project maintenance in the first place. Open source was supposed to be fun!

What’s the point of this spiel? It’s precisely what happened to pyca/cryptography: nobody asked them whether it was a good idea to try to run their code on HPPA, much less System/3906; some packagers just went ahead and did it, and are frustrated that it no longer works. People just assumed that it would, because there is still a norm that everything flows from C, and that any host with a halfway-functional C compiler should have the entire open source ecosystem at its disposal.

Reflections on trusting random platforms7

Security-sensitive software8,9, particularly software written in unsafe languages, is never secure in its own right.

The security of a program is a function of its own design and testing, as well as the design, testing, and basic correctness of its underlying platform: everything from the userspace, to the kernel, to the compilers themselves. The latter is an unsolved problem in the very best of cases: bugs are regularly found in even the most mature compilers (Clang, GCC) and their most mature backends (x86, ARM). Tiny changes to or differences in build systems can have profound effects at the binary level, like accidentally removing security mitigations. Seemingly innocuous patches can make otherwise safe code exploitable in the context of other vulnerabilities.

The problem gets worse as we move towards niche architectures and targets that are used primarily by small hobbyist communities. Consider m68k (one of the other architectures affected by pyca/cryptography’s move to Rust): even GCC was considering removing support due to lack of maintenance, until hobbyists stepped in. That isn’t to say that any particular niche target is full of bugs10; only to say that it’s a greater likelihood for niche targets in general. Nobody is regularly testing the mountain of userspace code that implicitly forms an operating contract with arbitrary programs on these platforms.

Project maintainers don’t want to chase down compiler bugs on ISAs or systems that they never intended to support in the first place, and aren’t receiving any active support feedback about. They especially don’t want to have vulnerabilities associated with their projects because of buggy toolchains or tooling inertia when working on security improvements.

Some more finger-pointing

As someone who likes C: this is all C’s fault. Really.

Beyond language-level unsafety (plenty of people have covered that already), C is organizationally unsafe:

By contemporary programming language standards, these are conspicuous gaps in functionality: we’ve long since learned to bake testing, building, distribution, and sound abstract machine semantics into the standard tooling for languages (and language design itself). But their absence is doubly pernicious: they ensure that C remains a perpetually unsafe development ecosystem, and an appealing target when bootstrapping a new platform.

The life of a package maintainer is hard

The project maintainer isn’t the only person hurting in the status quo.

Everything stated above also leads to a bum job for the lowly package maintainer11. They’re (probably) also an unpaid open source hobbyist, and they’re operating with constraints that the upstream isn’t likely to immediately understand:

They also have to deal with users who are unsympathetic to those reports, and who:

All of this leads to package maintainer burnout12, and an (increasingly) adversarial relationship between projects and their downstream distributors. Neither of those bodes well for projects, the health of critical packaging ecosystems, or (most importantly of all) the users themselves.

A path forwards?

I am just barely conceited enough to think that my potential solutions are worth broadcasting to the world. Here they are.

Build system and distribution transparency

Build systems are a mess; I’ve talked about their complexity in a professional setting.

A long term solution to the problem of support for platforms not originally considered by project authors is going to be two-pronged:

Support tiers

Rust certainly isn’t the first ecosystem to provide different support tiers, but they do a great job:

Give up on weird ISAs and platforms

I put this one last because it’s flippant, but it’s maybe the most important one: outside of hobbyists playing with weird architectures for fun (and accepting the overwhelming likelihood that most projects won’t immediately work for them), open source groups should not be unconditionally supporting the ecosystem for a large corporation’s hardware and/or platforms.

Companies should be paying for this directly: if pyca/cryptography actually broke on HPPA or IA-64, then HP or Intel or whoever should be forking over money to get it fixed or using their own horde of engineers to fix it themselves. No free work for platforms that only corporations are using14. No, this doesn’t violate the open-source ethos15; nothing about OSS says that you have to bend over backwards to support a corporate platform that you didn’t care about in the first place.

  1. For the unfamiliar: ASN.1 is a big, messy IDL and serialization format that has historically been a major source of easily exploitable bugs in cryptographic software. Cryptographic protocols regularly parse untrusted ASN.1; rewriting any amount of ASN.1 handling in a safe language (such as Rust) confers significant security benefits. 

  2. There’s a work-in-progress GCC frontend for Rust, but it can’t compile meaningful programs yet (as of writing). There’s also cranelift, which may break Rust’s dependency on LLVM in the future, but doesn’t support nearly as many targets yet. 

  3. There are too many vendor- and platform-specific versions of GCC in the wild to count. 

  4. Mentally substitute “C” for “C and/or C++” in various parts of this post. 

  5. It cannot be overstated just how important LLVM has been to the last decade or so of language research and development, and just how easy it’s made that work. But that’s a topic for an entirely separate post. 

  6. That’s the original S/390, mind you, not the 64-bit “s390x” (also known as z/Architecture). Think about your own C projects for a minute: are you willing to bet that they perform correctly on a 31-bit architecture that even Linux doesn’t support anymore? 

  7. With apologies to Ken Thompson

  8. Newsflash: all software is security sensitive. 

  9. I’m also conflating security and reliability here, which is potentially contentious. Maybe a future post. 

  10. Although m68k does seem to have its fair share. In a twist of irony: the GCC maintainers can’t repro some of the reports, since Debian may have patched the compiler! 

  11. Including yours truly. 

  12. Look at the maintainer turnover rate and/or unmaintained package ratio for your manager of choice. Either stat is probably higher than you’d expect. 

  13. And no, there’s no way to guess at the right amount of context ahead-of-time. A coredump doesn’t always cut it, and it probably wouldn’t be very cool of us to image the whole user’s machine. 

  14. In my extremely humble opinion. 

  15. Not that I care

Discussions: Reddit