Jul 25, 2022 Tags: cryptography, python, rust, security
This is another library announcement post: I’ve made and released pyrage, a collection of Python bindings for rage, the Rust implementation of age.
The module itself is pure Rust, with the excellent pyO3 providing the Python interface; I’ve also created a PEP 561-compatible type stubs package (pyrage-stubs) that can be used to typecheck uses of pyrage with mypy or another Python typechecker.
Read on for more context, implementation details, and some usage examples!
Tl;DR: You can install it via pip
and use it like any other Python package:
1
2
3
$ python -m pip install pyrage
$ python
>>> import pyrage
age is a file encryption tool (the age
CLI),
format, and
Go library.
It does one thing (file encryption), and it does it well:
Limited interoperability: age has two standard identity types: x25519 for public key encryption (encrypt with the public key, decrypt with the private key), and scrypt for password-based encryption (a single shared password both encrypts and decrypts).
However, the standard also allows implementations to handle custom identity and recipient types. The reference implementation supports SSH recipients1, meaning that a user can encrypt to another user’s SSH-formatted public key and expect that the recipient is able to decrypt it with their private key. This leads to a particularly nice user experience with GitHub users, whose public SSH keys are available:
1
2
3
4
# encrypt to each SSH recipient registered to 'woodruffw' on GitHub
$ age -R <(curl https://github.com/woodruffw.keys) \
secret.txt \
> secret.txt.age
age does not attempt to be a general purpose cryptography toolkit, the way PGP does: it doesn’t do digital signatures, doesn’t attempt to provide a (non-functional) web of trust, and doesn’t provide a smörgåsbord of dangerous and antiquated cryptographic primitives and formats.
Oh, and the reference implementation is written in a modern, memory safe programming language (Go). There’s also an interoperable and mostly feature-compatible2 implementation (rage) written in Rust, which is more my speed.
Another thing that’s my speed is Python3. But there’s no stable age implementation for Python! Someone has been working on an age package, but they’ve marked it as a “work in progress.” So: I figured I’d take an existing implementation and hammer out a Python wrapper for it, with a few goals in mind:
The pyrage
Python module (which is written in Rust) is broken up by concerns:
pyrage.passphrase
: Password-based encryption and decryptionpyrage.x25519
: Routines for creating and loading x25519 recipients and identitiespyrage.ssh
: Routines for loading SSH recipients and identities5pyrage
: The top-level encrypt
and decrypt
routinesThis one is a little special: passphrases exactly don’t fit into the identity/recipient model, so they have their own encryption and decryption APIs.
Fortunately, they’re very simple:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#[pyfunction]
fn encrypt<'p>(py: Python<'p>, plaintext: &[u8], passphrase: &str) -> PyResult<&'p PyBytes> {
let encryptor = Encryptor::with_user_passphrase(Secret::new(passphrase.into()));
let mut encrypted = vec![];
let mut writer = encryptor
.wrap_output(&mut encrypted)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
writer
.write_all(plaintext)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
writer
.finish()
.map_err(|e| PyValueError::new_err(e.to_string()))?;
Ok(PyBytes::new(py, &encrypted))
}
#[pyfunction]
fn decrypt<'p>(py: Python<'p>, ciphertext: &[u8], passphrase: &str) -> PyResult<&'p PyBytes> {
let decryptor =
match Decryptor::new(ciphertext).map_err(|e| PyValueError::new_err(e.to_string()))? {
Decryptor::Passphrase(d) => d,
_ => {
return Err(PyValueError::new_err(
"invalid ciphertext (not passphrase encrypted)",
))
}
};
let mut decrypted = vec![];
let mut reader = decryptor
.decrypt(&Secret::new(passphrase.into()), None)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
reader
.read_to_end(&mut decrypted)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
Ok(PyBytes::new(py, &decrypted))
}
These correspond to two Python APIs:
1
2
def encrypt(plaintext: bytes, passphrase: str) -> bytes: ...
def decrypt(ciphertext: bytes, passphrase: str) -> bytes: ...
There are only two things that are really worth commenting on here:
passphrase.decrypt
only works on passphrase-encrypted ciphertexts (duh). Passing
in a ciphertext that looks encrypted to a non-passphrase recipient will raise a ValueError
.
There’s a PyBytes::new
call at the end of each function; this is because
Python needs to fully own the backing buffer for the bytes
object that we return, so pyO3
ends up making a fully copy of either the ciphertext or the plaintext before returning.
This isn’t ideal from either a performance or a residual copy perspective, but I don’t think there’s a better way to do this at the moment. I’d love to be wrong, though!
x25519 and SSH look pretty similar, so I’ll just highlight the former:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
use std::str::FromStr;
use age::secrecy::ExposeSecret;
use pyo3::{exceptions::PyValueError, prelude::*, types::PyType};
#[pyclass(module = "pyrage.x25519")]
#[derive(Clone)]
pub(crate) struct Recipient(pub(crate) age::x25519::Recipient);
#[pymethods]
impl Recipient {
#[classmethod]
fn from_str(_cls: &PyType, v: &str) -> PyResult<Self> {
age::x25519::Recipient::from_str(v)
.map(Self)
.map_err(PyValueError::new_err)
}
fn __str__(&self) -> String {
self.0.to_string()
}
}
#[pyclass(module = "pyrage.x25519")]
#[derive(Clone)]
pub(crate) struct Identity(pub(crate) age::x25519::Identity);
#[pymethods]
impl Identity {
#[classmethod]
fn generate(_cls: &PyType) -> Self {
Self(age::x25519::Identity::generate())
}
#[classmethod]
fn from_str(_cls: &PyType, v: &str) -> PyResult<Self> {
let identity =
age::x25519::Identity::from_str(v).map_err(|e| PyValueError::new_err(e.to_string()))?;
Ok(Self(identity))
}
fn to_public(&self) -> Recipient {
Recipient(self.0.to_public())
}
fn __str__(&self) -> String {
self.0.to_string().expose_secret().into()
}
}
pub(crate) fn module(py: Python) -> PyResult<&PyModule> {
let module = PyModule::new(py, "x25519")?;
module.add_class::<Recipient>()?;
module.add_class::<Identity>()?;
Ok(module)
}
That’s it. There’s an x25519.Identity
and an x25519.Recipient
; both can be loaded from
strings (their serialized representations, per the age spec). Separately, an Identity
can be created from scratch (x25519.Identity.generate()
) and its corresponding
Recipient
(the public component) can be retrieved with Identity.to_public()
.
The only thing that’s even slightly funky here is Identity.__str__
, corresponding to
str(identity)
on the Python side. That’s the only way to turn an x25519.Identity
instance into
its interior (serialized) private key. Other than that, it’s an opaque handle that
the pyrage.decrypt
API knows how to use (we’ll see how it achieves polymorphism between
different Identity
classes in a moment).
Apart from passphrases (which, as I mentioned above, muddy the water between recipients and identities), a key property of the rage implementation of age is that encryption and decryption are generic over recipients and identities, respectively. Beyond that, both encryption and decryption can take multiple recipients/identities at once, corresponding to notions of “encrypt to all of these people” and “try to decrypt with each of these,” respectively.
In other words, the idea Python APIs for encryption and decryption look like this:
1
2
def encrypt(plaintext: bytes, recipients: Sequence[Recipient]) -> bytes: ...
def decrypt(ciphertext: bytes, identities: Sequence[Identity]) -> bytes: ...
If these were really Python APIs, this wouldn’t pose a problem: Recipient
and Identity
could be base classes, or ABCs, or even
protocol types describing the
common behavior of {x25519,ssh}.{Recipient,Identity}
.
But they aren’t really Python APIs; they’re Rust APIs that are exposed as Python APIs. And Rust has none of these things; it only has traits.
So: we need to convince Rust (via pyO3) that it can convert each member of each sequence
(whether recipient or identity) into something that has the appropriate behavior. The
types for SSH and x25519 are fundamentally heterogeneous (they’re just newtypes over
the corresponding rage
types), so those somethings has to be trait objects.
Unsurprisingly, rage
itself had the same idea: APIs like
Encryptor::with_recipients
take a Vec<Box<dyn Recipient>>
, meaning anything that implements the
Recipient
trait, which,
in turn, means
age::x25519::Recipient
and
age::ssh::Recipient
6.
The same goes for
RecipientsDecryptor
,
which takes an impl Iterator<Item = &'a dyn Identity>
in its decrypt()
routine.
But not so fast: pyO3 can’t expose arbitrary Rust types to Python; it needs to wrap them in a controlled manner7. As a result, we use the “newtype” idiom:
1
2
3
#[pyclass(module = "pyrage.x25519")]
#[derive(Clone)]
pub(crate) struct Recipient(pub(crate) age::x25519::Recipient);
and:
1
2
3
#[pyclass(module = "pyrage.ssh")]
#[derive(Clone)]
pub(crate) struct Recipient(pub(crate) age::ssh::Recipient);
…both of which implement age::Recipient
in their inner type. So far, so good.
Now we need something like this:
1
2
3
4
5
6
7
8
#[pyfunction]
fn encrypt<'p>(
py: Python<'p>,
plaintext: &[u8],
recipients: Vec<Box<dyn Recipient>>,
) -> PyResult<&'p PyBytes> {
unimplemented!()
}
…which doesn’t work:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
error[E0277]: the trait bound `Vec<Box<dyn age::Recipient>>: pyo3::FromPyObject<'_>` is not satisfied
--> src/lib.rs:123:17
|
123 | recipients: Vec<Box<dyn Recipient>>,
| ^^^ the trait `pyo3::FromPyObject<'_>` is not implemented for `Vec<Box<dyn age::Recipient>>`
|
= help: the trait `pyo3::FromPyObject<'a>` is implemented for `Vec<T>`
note: required by a bound in `extract_argument`
--> /home/william/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.16.5/src/impl_/extract_argument.rs:14:8
|
14 | T: FromPyObject<'py>,
| ^^^^^^^^^^^^^^^^^ required by this bound in `extract_argument`
error: aborting due to previous error; 2 warnings emitted
For more information about this error, try `rustc --explain E0277`.
error: could not compile `pyrage` due to 2 previous errors; 2 warnings emitted
The error here is (thankfully) instructive: we’re passing a Vec<Box<dyn Recipient>>
as a parameter,
but pyO3 doesn’t know how to marshal than from a Python object. Hence the need for FromPyObject
on T: Box<dyn Recipient>
8.
So, intuitively, we do something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
impl<'source> FromPyObject<'source> for Box<dyn Recipient> {
fn extract(ob: &'source PyAny) -> PyResult<Self> {
if let Ok(recipient) = ob.extract::<x25519::Recipient>() {
Ok(Box::new(recipient.0) as Box<dyn Recipient>)
} else if let Ok(recipient) = ob.extract::<ssh::Recipient>() {
Ok(Box::new(recipient.0) as Box<dyn Recipient>)
} else {
Err(PyTypeError::new_err(
"invalid type (expected a recipient type)",
))
}
}
}
…which also doesn’t work: both FromPyObject
and Recipient
are third-party traits,
so we’re violating Rust’s
trait coherence rules:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
--> src/lib.rs:18:1
|
18 | impl<'source> FromPyObject<'source> for Box<dyn Recipient> {
| ^^^^^^^^^^^^^^---------------------^^^^^------------------
| | | |
| | | `dyn age::Recipient` is not defined in the current crate
| | `std::alloc::Global` is not defined in the current crate
| impl doesn't use only types from inside the current crate
|
= note: define and implement a trait or new type instead
error: aborting due to previous error
For more information about this error, try `rustc --explain E0117`.
Sigh. So, what we really need:
Our (already created) newtypes for x25519::Recipient
and ssh::Recipient
A wrapper trait for age::Recipient
that looks like this:
1
2
3
trait PyrageRecipient: Recipient {
fn as_recipient(self: Box<Self>) -> Box<dyn Recipient>;
}
impl age::Recipient
and impl PyrageRecipient
for each of x25519::Recipient
and ssh::Recipient
, effectively just plumbing the inner trait implementation through
the outer newtype:
1
2
3
4
5
6
7
8
9
10
11
impl Recipient for x25519::Recipient {
fn wrap_file_key(&self, file_key: &FileKey) -> Result<Vec<Stanza>, EncryptError> {
self.0.wrap_file_key(file_key)
}
}
impl PyrageRecipient for x25519::Recipient {
fn as_recipient(self: Box<Self>) -> Box<dyn Recipient> {
self as Box<dyn Recipient>
}
}
(Note the self: Box<Self>
, to assert that we really have a Box<dyn PyrageRecipient>
and not some other kind of self
, like a &dyn PyrageRecipient
.)
Finally, our impl FromPyObject
:
1
2
3
4
5
6
7
8
9
10
11
12
13
impl<'source> FromPyObject<'source> for Box<dyn PyrageRecipient> {
fn extract(ob: &'source PyAny) -> PyResult<Self> {
if let Ok(recipient) = ob.extract::<x25519::Recipient>() {
Ok(Box::new(recipient) as Box<dyn PyrageRecipient>)
} else if let Ok(recipient) = ob.extract::<ssh::Recipient>() {
Ok(Box::new(recipient) as Box<dyn PyrageRecipient>)
} else {
Err(PyTypeError::new_err(
"invalid type (expected a recipient type)",
))
}
}
}
All of that, just to take some functionality that we know we have and expose it in a way that Rust understands is safe!
Fortunately, we can abbreviate a good deal of it with macros:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
macro_rules! recipient_traits {
($($t:ty),+) => {
$(
impl Recipient for $t {
fn wrap_file_key(&self, file_key: &FileKey) -> Result<Vec<Stanza>, EncryptError> {
self.0.wrap_file_key(file_key)
}
}
impl PyrageRecipient for $t {
fn as_recipient(self: Box<Self>) -> Box<dyn Recipient> {
self as Box<dyn Recipient>
}
}
)*
}
}
recipient_traits!(ssh::Recipient, x25519::Recipient);
…and repeat all of that for Identity
, giving us these top-level encrypt
and decrypt
APIs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#[pyfunction]
fn encrypt<'p>(
py: Python<'p>,
plaintext: &[u8],
recipients: Vec<Box<dyn PyrageRecipient>>,
) -> PyResult<&'p PyBytes> {
let recipients = recipients.into_iter().map(|pr| pr.as_recipient()).collect();
let encryptor = Encryptor::with_recipients(recipients);
let mut encrypted = vec![];
let mut writer = encryptor
.wrap_output(&mut encrypted)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
writer
.write_all(plaintext)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
writer
.finish()
.map_err(|e| PyValueError::new_err(e.to_string()))?;
Ok(PyBytes::new(py, &encrypted))
}
#[pyfunction]
fn decrypt<'p>(
py: Python<'p>,
ciphertext: &[u8],
identities: Vec<Box<dyn PyrageIdentity>>,
) -> PyResult<&'p PyBytes> {
let identities = identities.iter().map(|pi| pi.as_ref().as_identity());
let decryptor =
match age::Decryptor::new(ciphertext).map_err(|e| PyValueError::new_err(e.to_string()))? {
age::Decryptor::Recipients(d) => d,
age::Decryptor::Passphrase(_) => {
return Err(PyValueError::new_err(
"invalid ciphertext (encrypted with passphrase, not identities)",
))
}
};
let mut decrypted = vec![];
let mut reader = decryptor
.decrypt(identities)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
reader
.read_to_end(&mut decrypted)
.map_err(|e| PyValueError::new_err(e.to_string()))?;
Ok(PyBytes::new(py, &decrypted))
}
What good would an announcement-style blog post be without some (small) examples of actually using pyrage?
Here’s how two users can create x25519 identities and encrypt to each other:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from pyrage import encrypt, decrypt, x25519
alice = x25519.Identity.generate()
bob = x25519.Identity.generate()
# alice encrypts to bob
bobs_eyes_only = encrypt(
b"give me a ping, vasily. one ping only.", [bob.to_public()]
)
# bob encrypts to alice
alices_eyes_only = encrypt(
b"it's a long way to tipperary!", [alice.to_public()]
)
# alice decrypts
decrypt(alices_eyes_only, [alice])
# bob decrypts
decrypt(bobs_eyes_only, [bob])
Here’s how a user can encrypt to multiple recipients, including recipients
of different types (x25519, ssh-rsa
, and ssh-ed25519
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from pyrage import encrypt, ssh, x25519
# load a recipient from an OpenSSH-style RSA public key
recp1 = ssh.Recipient.from_str("ssh-rsa ...")
# load a recipient from an OpenSSH-style Ed25519 public key
recp2 = ssh.Recipient.from_str("ssh-ed25519 ...")
# load a recipient from an age v1 x25519 public key
recp3 = x25519.Recipient.from_str("age1...")
# encrypt to all three recipients
encrypted = encrypt(
b"the british have stopped making mistakes.", [recp1, recp2, recp3]
)
Finally, here’s two users doing encryption and decryption with a shared password:
1
2
3
4
5
6
7
8
from pyrage import passphrase
# encrypt the cleartext with password "r4m1us"
cleartext = b"engage the silent drive"
encrypted = passphrase.encrypt(cleartext, "r4m1us")
decrypted = passphrase.decrypt(encrypted, "r4m1us")
assert cleartext == decrypted
At the moment, the latest version of pyrage published on PyPI is a
release candidate. I plan on doing a full
1.0.0
“stable” release after a few small changes, to whit:
ValueError
when an error occurs. These should really
be more specific error types, so that users of pyrage
can write more precise
exception handling code.twine upload dist/*
to send it to PyPI). It should all really be done in the CI.Besides that, the API is stable and the package is ready to use.
Overall, this was a pretty easy and pleasant set of wrappers to write. The only real hiccup
was with the Python-side polymorphism, corresponding to the Recipient
and Identity
traits in
rage
.
In turn, the only reason that was hard was because of Rust’s third-party trait restrictions, which
composed with the
lack of newtype trait projection9 to make
conversion into the supertrait require a bunch of macro ugliness. It’s not trivial, but the
Rust compiler could improve the experience here in a number of ways: allowing inner trait
implementations to “puncture” the newtype via an explicit derive
or other syntax, allowing
third-party trait on third-party type implementations in a limited set of cases that don’t violate
coherency10, and providing more automatic boilerplate for the “newtrait” pattern11.
Specifically, it defines two extra recipient types: ssh-rsa
for RSA and ssh-ed25519
for Ed25519. ↩
Including, nicely, the CLI: rage
can be used the same as the Go reference implementation’s age
CLI. ↩
My Ruby skills continue to atrophy pitifully. ↩
For two reasons: it reduces the likelihood of an accidental timing oracle in the Python code, and it makes future updates and maintenance easier. ↩
SSH recipients and identities can’t be created from scratch, only loaded from existing material (consistent with what rage
itself supports). ↩
Among others, like plugins. But we just aren’t going to support those in pyrage. ↩
In particular: pyO3 needs to be able to apply its #[pyclass]
proc macro to the type, which it can only do for first-party types. This in turn excludes types like age::ssh::Recipient
, since they’re third party types in the context of the pyrage
crate. ↩
This is a good example of Rust having nice error messages, even when the failure cause is complex: in this case, pyO3
knows how to create a Vec<T>
(it’s just a list
), but only if every member of that list object is T: FromPyObject
. ↩
Also known as “generalized newtype deriving,” presumably in reference to Haskell’s GeneralizedNewtypeDeriving
extension. ↩
In particular, I’m pretty sure you could solve this in at least two ways: (1) allow “first-come-first-serve” trait implementations, meaning that the current “top” crate is given priority, or (2) allow for third-party traits on third-party types only when the “top” crate is a “leaf,” i.e. an executable build. The first solution isn’t ideal (it violates the referential transparency of dependencies), but I think the second is okay. ↩
i.e., trait Foo: ThirdPartyTrait {}
with no meaningful body. I just made this name up, there’s probably another phrase for this. ↩