Mar 10, 2022 Tags: programming, rant, rust
Two years ago, I wrote a post with a handful of grievances about Rust, a language that I then (and still) consider my favorite compiled language.
In the two years since I’ve gone from considering myself familiar with Rust, to comfortable in it, to thinking in Rust even when writing in other languages (sometimes to my detriment). So, like two years ago, this post should be read from a place of love for Rust, and not a cheap attempt to knock it.
IntoIterator
is too overloadedHere is how the IntoIterator
docs
explain the trait:
Conversion into an Iterator.
By implementing IntoIterator for a type, you define how it will be converted to an iterator. This is common for types which describe a collection of some kind.
If that sounds extremely generic to you, it’s because it is! Here are just a few of the ways
IntoIterator
is used in the wild, using a generic Container<T>
for motivation:
For producing “normal” borrowing iterators: &T for T in Container<T>
For producing iterators over mutable references: &mut T for T in Container<T>
For producing “consuming” (i.e., by-value) iterators: T for T in Container<T>
For producing “owned” (i.e., copying or cloning) iterators: T for T in Container<T: Clone>
1
Each of these can be a useful iterator to have, which is why container types frequently have
multiple Item
-variant IntoIterator
implementations. Those implementations are, in turn,
occasionally (optionally!) disambiguated with aliases: iter_mut()
, drain()
2, &c.
The downside is comprehension: absent of context, an into_iter()
could be doing any of the
above3, leaving it to me (or any other poor soul) to read further into the iterator’s
consumer to determine what’s actually going on. It’s never ambiguous (only one selection is
possible at compile time!), but it can be difficult to rapidly comprehend in the manner that
Rust otherwise facilitates.
IntoIterator
is already firmly baked into Rust’s core, so it’s probably too late to devolve it
into the half dozen traits that it conceptually covers. But if I could turn back time:
IntoIterator
itself could be spelled AsIterator
or ToIterator
instead, to prevent
the misleading ownership connotation of
Into
.
OwningIterator
and BorrowingIterator
would solve the ownership overlap, providing
iter_owned()
and iter()
respectively. I’m not sure how nicely this would play
with the overall soundness of Rust’s traits and types, but I can dream.
Rust’s safety is a sort of inverted Faustian bargain: in exchange for a small amount of control over memory layout, we get complete spatial and temporal memory safety, automatic memory management without a garbage collector, and zero-cost abstractions that let us take full advantage of our optimizing compilers.
As such, when I say that “high-assurance” Rust is difficult, I don’t mean Safe Rust.
What I mean is that we’ve made a trade: in exchange for all of this safety, we’ve accepted
a certain amount of mandatory invariant enforcement — the Rust standard library
will panic when an invariant would produce unsafety, and community maintained
libraries will use panic!
,
assert!
, and the like to trade
the occasional uncontrolled program termination for slightly better programming
ergonomics (fewer Option
s and Result
s).
Invariant enforcement is a good thing and, by and large, both Rust’s internal
and community uses of panics are judicious: by convention, panicking functions tend to
have either (1) a non-panicking Result
or Option
alternative, or (2) failure conditions
that are environmental in a way that mandates program termination anyways (e.g., stack exhaustion).
The end result: the Rust standard library and ecosystem are full of panics that almost never occur, panics that are only specified informally (i.e., in human-readable documentation). But “almost never” isn’t always good enough: it’s sometimes nice to have the assurance that no code being executed can possibly panic.
To the best of my knowledge, there are only imperfect solutions to this:
You can use clippy
to ban source-level panics
in your own code, primarily by using the
expect_used
,
unwrap_used
,
and panic
lints. Each of these
is disabled by default, so users need to explicitly opt into them.
These lints work excellently for first-party code! But they can’t prevent panics
in third-party code4, because clippy
only analyzes the source of the active
crate. In other words, clippy
won’t catch the following under any circumstances:
1
2
3
4
5
use thirdparty;
fn foo() {
thirdparty::calls_unwrap_internally();
}
Rust prides itself on its rich package ecosystem, which means that just about any third-party dependency can introduce implicit panics. Not so good.
You can use a crate like no_panic
5
to catch panics by promoting them into compiler (really linker) errors. This is incredibly clever,
but with a variety of downsides:
It fundamentally relies on the compiler to optimize away unreachable panics, making
it unreliable at lower optimization levels (particularly, the default debug build level).
Similar, any tweaking of Rust’s panicking behavior (e.g., a different panicking strategy
like panic = "abort"
) can break the linker trick being used here.
It doesn’t work directly on library crates, since library crates don’t directly invoke the linker. In order to be effective, the “leaf” build needs to be something that requires the linker, like an executable or shared object.
Because the errors happen at link-time instead of compile time, they’re largely stripped
of their source context. no_panic
is clever and uses a procedural macro to parse the function
signature and present it as part of the linker error6, but that’s just about the limit
of the context it can provide.
The example in the README demonstrates this inscrutability:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Compiling no-panic-demo v0.0.1
error: linking with `cc` failed: exit code: 1
|
= note: /no-panic-demo/target/release/deps/no_panic_demo-7170785b672ae322.no_p
anic_demo1-cba7f4b666ccdbcbbf02b7348e5df1b2.rs.rcgu.o: In function `_$LT$no_pani
c_demo..demo..__NoPanic$u20$as$u20$core..ops..drop..Drop$GT$::drop::h72f8f423002
b8d9f':
no_panic_demo1-cba7f4b666ccdbcbbf02b7348e5df1b2.rs:(.text._ZN72_$LT$no
_panic_demo..demo..__NoPanic$u20$as$u20$core..ops..drop..Drop$GT$4drop17h72f8f42
3002b8d9fE+0x2): undefined reference to `
ERROR[no-panic]: detected panic in function `demo`
'
collect2: error: ld returned 1 exit status
You can maybe see the inner callsite responsible for the panic, but not easily.
In sum, it’s very difficult to write provably non-panicking code in Rust in 2022. Avoiding explicit panics in first-party code is perfectly possible (and even ergonomic!); it’s the panics embedded in third-party dependencies and runtime code that are nearly impossible to track.
I have some ideas for improving this, ones that are outside the scope of this gripe-fest. Maybe another time.
Integration tests are one of Cargo’s more oblique features: in addition to hosting your tests in-tree
(i.e., in a mod tests
in each foo.rs
file), you can also create a parallel tests/
tree
for tests whose scope reaches beyond the unit level.
In other words, if your source tree looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
src/
├── kbs2
│ ├── agent.rs
│ ├── backend.rs
│ ├── command.rs
│ ├── config.rs
│ ├── generator.rs
│ ├── input.rs
│ ├── mod.rs
│ ├── record.rs
│ ├── session.rs
│ └── util.rs
└── main.rs
tests/
├── common
│ └── mod.rs
├── test_kbs2_init.rs
└── test_kbs2.rs
…then your cargo test
output might look something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
william@janus kbs2 [0:0] integration-tests $ cargo test
Compiling kbs2 v0.6.0-rc.1 (/home/william/devel/self/kbs2)
Finished test [unoptimized + debuginfo] target(s) in 4.25s
Running unittests (target/debug/deps/kbs2-2dd9eb541b527992)
running XX tests
test kbs2::backend::tests::test_ragelib_create_keypair ... ok
test kbs2::config::tests::test_initialize_wrapped ... ok
test kbs2::backend::tests::test_ragelib_create_wrapped_keypair ... ok
test kbs2::backend::tests::test_ragelib_rewrap_keyfile ... ok
test result: ok. XX passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 8.75s
Running tests/test_kbs2.rs (target/debug/deps/test_kbs2-4f1d8387af33e18c)
running 3 tests
test test_kbs2_version ... ok
test test_kbs2_help ... ok
test test_kbs2_completions ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.03s
Running tests/test_kbs2_init.rs (target/debug/deps/test_kbs2_init-d890a2d5d4f7537d)
running 1 test
test test_kbs2_init ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
This is a fantastic feature: you don’t need to do anything special to do integration testing on a Rust codebase!
Except…
Cargo doesn’t understand how to run integration tests against a binary-only crate: if your
src
tree has only main.rs
and no lib.rs
, then you won’t be able to use some::mod
from under test/
. This is a
known issue, one without
a satisfying fix or workaround that doesn’t involve turning your binary’s APIs into a public
interface7.
Every file in the tests
directory is a separate crate, compiled to its own executable.
This is a reasonable decision, with undesirable consequences:
There is no naive way to mark a file under tests/
as not containing integration tests.
As the documentation notes,
adding tests/common.rs
to manage shared helpers will add a common
section to your
cargo test
output. The “official” workaround is to make common.rs
into a directory-style
module instead (common.rs -> common/mod.rs
), which cargo test
then apparently ignores
for test collection purposes. It’s not the end of the world, but it feels like an incidental
hack (it presumably works because cargo test
doesn’t recurse through tests/
, which
doesn’t seem to be explicitly documented anywhere).
More annoyingly: because each file under tests/
is its own binary, Rust’s otherwise
excellent dead code detection does not work correctly on integration tests.
This issue contains the full detail, but
to summarize: if test_foo.rs
and test_bar.rs
make disjoint use of common/mod.rs
, then
rustc
will see “unused” code in the compilations of both test_foo
and test_bar
, despite
the totality of all integration tests having complete coverage for common/mod.rs
.
This is again mentioned only obliquely in the documentation: you have to know that separate
compilations mean that Cargo won’t track dead code in your helper modules, even though
the “pattern” of submodules under tests/
is one that Cargo otherwise knows about.
The fix? I don’t think it’s a good one, but I ended up putting #![allow(dead_code)]
at the top of my common
integration test module.
These are trivial quality-of-developer-life things, each of which has a very good reason for not being different8. But they’re still a drag!
cargo install
is too eagercargo install
is the main interface for installing user-facing executables from the crates
ecosystem. Because it’s built right into the Rust toolchain, lots of projects list cargo install $FOO
as a recommended installation technique. So far, so good.
What’s not so good is how cargo install
chooses to do builds. Unlike cargo build
,
cargo install
ignores Cargo.lock
by default, meaning that a different but “compatible”
(per SemVer) version might be selected for the final compiled product.
There are (at least) two problems with this:
It violates some of the (perhaps incorrectly) presumed consistency of telling users to
run cargo install
to install your program: each user may have a slightly different dependency tree
depending on when they ran cargo install
. Debugging small compatibility errors then becomes
an exercise in frustration, as users and maintainers determine the relevant differences in their
dependency trees.
More perniciously: cargo
’s interpretation of semantic versioning diverges from the normal interpretation:
cargo install
(and other cargo
subcommands?) treat 0.X.Y
and 0.X.Z
as compatible
releases, despite the SemVer spec explicitly saying otherwise.
cargo install
treats pre-release versions (e.g. 2.0.0-pre.1
) as compatible with both
their major release (i.e. 2.0.0
) and all other pre-releases in the same range
(e.g. 2.0.0-pre.2
), despite the SemVer spec warning that prereleases
must be treated as unstable and non-API-conforming.
The former behavior can be frustrating, but is ultimately justifiable in an ecosystem that largely
respects semantic versioning: it almost always makes sense to install foo 1.2.4
instead of
foo 1.2.3
. When a package misbehaves (i.e., fails to follow SemVer) or this behavior simply isn’t
desired for whatever reason, cargo install --locked
provides an escape hatch (albeit not
a default one).
The latter behavior is, in my opinion, unjustifiable: it’s inconsistent with the compatibility standards established by SemVer and otherwise respected by Cargo (and the overwhelming majority of crates in the ecosystem), and directly interferes with any attempts to use pre-releases (as well as release candidates, betas, &c.) in a stable manner in programs that ordinary users are expected to install.
The umbrella issue for this has been open since 2019, and is tracked
here. Prominent projects that have had
cargo install
failures due to it include (in no particular order):
bat
(SemVer violation)cargo-expand
(SemVer violation)xsv
(Dependencies require a newer compiler9)sqlx
(Incorrect beta/rc upgrade)cargo-deny
(SemVer violation)cargo-geiger
(Dependencies require a newer compiler)c2rust
(Dependencies require a newer compiler)rage
(Incorrect beta/rc upgrade)At the end of the day, Rust is still my preferred compiled language and development ecosystem. I see the increase in visible problems as a function of my increased familiarity with the language, not as insurmountable flaws — after all, similar problems exist in just about every language (and packaging ecosystem).
I didn’t want to bloat this post with too many grievances, so here’s a smattering of other (more minor?) things that I’ve noticed over the years:
The static analysis story for side effects and accidental data use still isn’t great in Rust — it’s remarkably easy to cause unintentional side effects by forgetting to use closures in long “fluent” method compositions, or to accidentally drop data during I/O by dropping a buffered I/O handle that still has pending content.
Pin<T>
and co. aren’t very ergonomic, and self-referential structs are even less ergonomic.
I would absolutely love to see a 'self
lifetime that doesn’t require a third party crate
like ouroboros
.
Procedural macros are hard to write, harder than they should be. Crystal has a wonderful and extremely ergonomic macro system that Rust could learn from, one that doesn’t require ad-hoc reinterpretation of language tokens and that integrates seamlessly with syntax highlighting in editors.
i.e. the same as Iterator::cloned
or Iterator::copied
, except Container<T>: ?Iterator
. ↩
Vec::drain()
isn’t actually an alias for a consuming IntoIterator
, for some reason — they appear to have completely separate types. I’m not sure why that is. ↩
Or even something totally different! An IntoIterator
could choose to produce Item = String
for a Vec<u32>
; there’s nothing stopping it. ↩
i.e., dependencies and the standard/core runtime. ↩
Itself built on dont_panic
. ↩
This is my favorite part of this pile of hacks: it displays the error message by embedding it as the symbol name for the unresolved function. Evil! ↩
This is, admittedly, not a problem in many testing scenarios: if you’re building a binary, then your integration tests should be testing that binary instead of the interfaces beneath it. But there are legitimate scenarios (e.g., comparing a computed result to a constant within the private API). ↩
For example, doing dead code detection across all integration test crates would probably be a mess and is certainly an abstraction violation. ↩
I didn’t mention this as a separate problem, but it’s just as serious of one: it’s not clear whether SemVer should include MSRV changes, and so plenty have projects have caused transitive cargo install
failures by bumping their MSRV in a minor release. ↩