ENOSUCHBLOG

Programming, philosophy, pedaling.


age encryption in Python with pyrage

Jul 25, 2022     Tags: cryptography, python, rust, security    

This post is at least a year old.

This is another library announcement post: I’ve made and released pyrage, a collection of Python bindings for rage, the Rust implementation of age.

The module itself is pure Rust, with the excellent pyO3 providing the Python interface; I’ve also created a PEP 561-compatible type stubs package (pyrage-stubs) that can be used to typecheck uses of pyrage with mypy or another Python typechecker.

Read on for more context, implementation details, and some usage examples!

Tl;DR: You can install it via pip and use it like any other Python package:

1
2
3
$ python -m pip install pyrage
$ python
>>> import pyrage

Background

age is a file encryption tool (the age CLI), format, and Go library.

It does one thing (file encryption), and it does it well:

age does not attempt to be a general purpose cryptography toolkit, the way PGP does: it doesn’t do digital signatures, doesn’t attempt to provide a (non-functional) web of trust, and doesn’t provide a smörgåsbord of dangerous and antiquated cryptographic primitives and formats.

Oh, and the reference implementation is written in a modern, memory safe programming language (Go). There’s also an interoperable and mostly feature-compatible2 implementation (rage) written in Rust, which is more my speed.

Another thing that’s my speed is Python3. But there’s no stable age implementation for Python! Someone has been working on an age package, but they’ve marked it as a “work in progress.” So: I figured I’d take an existing implementation and hammer out a Python wrapper for it, with a few goals in mind:

Implementation details

The pyrage Python module (which is written in Rust) is broken up by concerns:

Passphrase-based encryption and decryption

This one is a little special: passphrases exactly don’t fit into the identity/recipient model, so they have their own encryption and decryption APIs.

Fortunately, they’re very simple:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#[pyfunction]
fn encrypt<'p>(py: Python<'p>, plaintext: &[u8], passphrase: &str) -> PyResult<&'p PyBytes> {
    let encryptor = Encryptor::with_user_passphrase(Secret::new(passphrase.into()));
    let mut encrypted = vec![];
    let mut writer = encryptor
        .wrap_output(&mut encrypted)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;
    writer
        .write_all(plaintext)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;
    writer
        .finish()
        .map_err(|e| PyValueError::new_err(e.to_string()))?;

    Ok(PyBytes::new(py, &encrypted))
}

#[pyfunction]
fn decrypt<'p>(py: Python<'p>, ciphertext: &[u8], passphrase: &str) -> PyResult<&'p PyBytes> {
    let decryptor =
        match Decryptor::new(ciphertext).map_err(|e| PyValueError::new_err(e.to_string()))? {
            Decryptor::Passphrase(d) => d,
            _ => {
                return Err(PyValueError::new_err(
                    "invalid ciphertext (not passphrase encrypted)",
                ))
            }
        };
    let mut decrypted = vec![];
    let mut reader = decryptor
        .decrypt(&Secret::new(passphrase.into()), None)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;
    reader
        .read_to_end(&mut decrypted)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;

    Ok(PyBytes::new(py, &decrypted))
}

These correspond to two Python APIs:

1
2
def encrypt(plaintext: bytes, passphrase: str) -> bytes: ...
def decrypt(ciphertext: bytes, passphrase: str) -> bytes: ...

There are only two things that are really worth commenting on here:

Key-based recipients and identities

x25519 and SSH look pretty similar, so I’ll just highlight the former:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
use std::str::FromStr;

use age::secrecy::ExposeSecret;
use pyo3::{exceptions::PyValueError, prelude::*, types::PyType};

#[pyclass(module = "pyrage.x25519")]
#[derive(Clone)]
pub(crate) struct Recipient(pub(crate) age::x25519::Recipient);

#[pymethods]
impl Recipient {
    #[classmethod]
    fn from_str(_cls: &PyType, v: &str) -> PyResult<Self> {
        age::x25519::Recipient::from_str(v)
            .map(Self)
            .map_err(PyValueError::new_err)
    }

    fn __str__(&self) -> String {
        self.0.to_string()
    }
}

#[pyclass(module = "pyrage.x25519")]
#[derive(Clone)]
pub(crate) struct Identity(pub(crate) age::x25519::Identity);

#[pymethods]
impl Identity {
    #[classmethod]
    fn generate(_cls: &PyType) -> Self {
        Self(age::x25519::Identity::generate())
    }

    #[classmethod]
    fn from_str(_cls: &PyType, v: &str) -> PyResult<Self> {
        let identity =
            age::x25519::Identity::from_str(v).map_err(|e| PyValueError::new_err(e.to_string()))?;

        Ok(Self(identity))
    }

    fn to_public(&self) -> Recipient {
        Recipient(self.0.to_public())
    }

    fn __str__(&self) -> String {
        self.0.to_string().expose_secret().into()
    }
}

pub(crate) fn module(py: Python) -> PyResult<&PyModule> {
    let module = PyModule::new(py, "x25519")?;

    module.add_class::<Recipient>()?;
    module.add_class::<Identity>()?;

    Ok(module)
}

That’s it. There’s an x25519.Identity and an x25519.Recipient; both can be loaded from strings (their serialized representations, per the age spec). Separately, an Identity can be created from scratch (x25519.Identity.generate()) and its corresponding Recipient (the public component) can be retrieved with Identity.to_public().

The only thing that’s even slightly funky here is Identity.__str__, corresponding to str(identity) on the Python side. That’s the only way to turn an x25519.Identity instance into its interior (serialized) private key. Other than that, it’s an opaque handle that the pyrage.decrypt API knows how to use (we’ll see how it achieves polymorphism between different Identity classes in a moment).

Recipient- and identity-based encryption and decryption

Apart from passphrases (which, as I mentioned above, muddy the water between recipients and identities), a key property of the rage implementation of age is that encryption and decryption are generic over recipients and identities, respectively. Beyond that, both encryption and decryption can take multiple recipients/identities at once, corresponding to notions of “encrypt to all of these people” and “try to decrypt with each of these,” respectively.

In other words, the idea Python APIs for encryption and decryption look like this:

1
2
def encrypt(plaintext: bytes, recipients: Sequence[Recipient]) -> bytes: ...
def decrypt(ciphertext: bytes, identities: Sequence[Identity]) -> bytes: ...

If these were really Python APIs, this wouldn’t pose a problem: Recipient and Identity could be base classes, or ABCs, or even protocol types describing the common behavior of {x25519,ssh}.{Recipient,Identity}.

But they aren’t really Python APIs; they’re Rust APIs that are exposed as Python APIs. And Rust has none of these things; it only has traits.

So: we need to convince Rust (via pyO3) that it can convert each member of each sequence (whether recipient or identity) into something that has the appropriate behavior. The types for SSH and x25519 are fundamentally heterogeneous (they’re just newtypes over the corresponding rage types), so those somethings has to be trait objects.

Unsurprisingly, rage itself had the same idea: APIs like Encryptor::with_recipients take a Vec<Box<dyn Recipient>>, meaning anything that implements the Recipient trait, which, in turn, means age::x25519::Recipient and age::ssh::Recipient6. The same goes for RecipientsDecryptor, which takes an impl Iterator<Item = &'a dyn Identity> in its decrypt() routine.

But not so fast: pyO3 can’t expose arbitrary Rust types to Python; it needs to wrap them in a controlled manner7. As a result, we use the “newtype” idiom:

1
2
3
#[pyclass(module = "pyrage.x25519")]
#[derive(Clone)]
pub(crate) struct Recipient(pub(crate) age::x25519::Recipient);

and:

1
2
3
#[pyclass(module = "pyrage.ssh")]
#[derive(Clone)]
pub(crate) struct Recipient(pub(crate) age::ssh::Recipient);

…both of which implement age::Recipient in their inner type. So far, so good.

Now we need something like this:

1
2
3
4
5
6
7
8
#[pyfunction]
fn encrypt<'p>(
    py: Python<'p>,
    plaintext: &[u8],
    recipients: Vec<Box<dyn Recipient>>,
) -> PyResult<&'p PyBytes> {
    unimplemented!()
}

…which doesn’t work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
error[E0277]: the trait bound `Vec<Box<dyn age::Recipient>>: pyo3::FromPyObject<'_>` is not satisfied
   --> src/lib.rs:123:17
    |
123 |     recipients: Vec<Box<dyn Recipient>>,
    |                 ^^^ the trait `pyo3::FromPyObject<'_>` is not implemented for `Vec<Box<dyn age::Recipient>>`
    |
    = help: the trait `pyo3::FromPyObject<'a>` is implemented for `Vec<T>`
note: required by a bound in `extract_argument`
   --> /home/william/.cargo/registry/src/github.com-1ecc6299db9ec823/pyo3-0.16.5/src/impl_/extract_argument.rs:14:8
    |
14  |     T: FromPyObject<'py>,
    |        ^^^^^^^^^^^^^^^^^ required by this bound in `extract_argument`


error: aborting due to previous error; 2 warnings emitted


For more information about this error, try `rustc --explain E0277`.

error: could not compile `pyrage` due to 2 previous errors; 2 warnings emitted

The error here is (thankfully) instructive: we’re passing a Vec<Box<dyn Recipient>> as a parameter, but pyO3 doesn’t know how to marshal than from a Python object. Hence the need for FromPyObject on T: Box<dyn Recipient>8.

So, intuitively, we do something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
impl<'source> FromPyObject<'source> for Box<dyn Recipient> {
    fn extract(ob: &'source PyAny) -> PyResult<Self> {
        if let Ok(recipient) = ob.extract::<x25519::Recipient>() {
            Ok(Box::new(recipient.0) as Box<dyn Recipient>)
        } else if let Ok(recipient) = ob.extract::<ssh::Recipient>() {
            Ok(Box::new(recipient.0) as Box<dyn Recipient>)
        } else {
            Err(PyTypeError::new_err(
                "invalid type (expected a recipient type)",
            ))
        }
    }
}

…which also doesn’t work: both FromPyObject and Recipient are third-party traits, so we’re violating Rust’s trait coherence rules:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
  --> src/lib.rs:18:1
   |
18 | impl<'source> FromPyObject<'source> for Box<dyn Recipient> {
   | ^^^^^^^^^^^^^^---------------------^^^^^------------------
   | |             |                         |
   | |             |                         `dyn age::Recipient` is not defined in the current crate
   | |             `std::alloc::Global` is not defined in the current crate
   | impl doesn't use only types from inside the current crate
   |
   = note: define and implement a trait or new type instead


error: aborting due to previous error


For more information about this error, try `rustc --explain E0117`.

Sigh. So, what we really need:

All of that, just to take some functionality that we know we have and expose it in a way that Rust understands is safe!

Fortunately, we can abbreviate a good deal of it with macros:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
macro_rules! recipient_traits {
    ($($t:ty),+) => {
        $(
            impl Recipient for $t {
                fn wrap_file_key(&self, file_key: &FileKey) -> Result<Vec<Stanza>, EncryptError> {
                    self.0.wrap_file_key(file_key)
                }
            }

            impl PyrageRecipient for $t {
                fn as_recipient(self: Box<Self>) -> Box<dyn Recipient> {
                    self as Box<dyn Recipient>
                }
            }
        )*
    }
}

recipient_traits!(ssh::Recipient, x25519::Recipient);

…and repeat all of that for Identity, giving us these top-level encrypt and decrypt APIs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#[pyfunction]
fn encrypt<'p>(
    py: Python<'p>,
    plaintext: &[u8],
    recipients: Vec<Box<dyn PyrageRecipient>>,
) -> PyResult<&'p PyBytes> {
    let recipients = recipients.into_iter().map(|pr| pr.as_recipient()).collect();

    let encryptor = Encryptor::with_recipients(recipients);
    let mut encrypted = vec![];
    let mut writer = encryptor
        .wrap_output(&mut encrypted)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;
    writer
        .write_all(plaintext)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;
    writer
        .finish()
        .map_err(|e| PyValueError::new_err(e.to_string()))?;

    Ok(PyBytes::new(py, &encrypted))
}

#[pyfunction]
fn decrypt<'p>(
    py: Python<'p>,
    ciphertext: &[u8],
    identities: Vec<Box<dyn PyrageIdentity>>,
) -> PyResult<&'p PyBytes> {
    let identities = identities.iter().map(|pi| pi.as_ref().as_identity());

    let decryptor =
        match age::Decryptor::new(ciphertext).map_err(|e| PyValueError::new_err(e.to_string()))? {
            age::Decryptor::Recipients(d) => d,
            age::Decryptor::Passphrase(_) => {
                return Err(PyValueError::new_err(
                    "invalid ciphertext (encrypted with passphrase, not identities)",
                ))
            }
        };

    let mut decrypted = vec![];
    let mut reader = decryptor
        .decrypt(identities)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;
    reader
        .read_to_end(&mut decrypted)
        .map_err(|e| PyValueError::new_err(e.to_string()))?;

    Ok(PyBytes::new(py, &decrypted))
}

Usage examples

What good would an announcement-style blog post be without some (small) examples of actually using pyrage?

Here’s how two users can create x25519 identities and encrypt to each other:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from pyrage import encrypt, decrypt, x25519

alice = x25519.Identity.generate()
bob = x25519.Identity.generate()

# alice encrypts to bob
bobs_eyes_only = encrypt(
    b"give me a ping, vasily. one ping only.", [bob.to_public()]
)

# bob encrypts to alice
alices_eyes_only = encrypt(
    b"it's a long way to tipperary!", [alice.to_public()]
)

# alice decrypts
decrypt(alices_eyes_only, [alice])

# bob decrypts
decrypt(bobs_eyes_only, [bob])

Here’s how a user can encrypt to multiple recipients, including recipients of different types (x25519, ssh-rsa, and ssh-ed25519):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from pyrage import encrypt, ssh, x25519

# load a recipient from an OpenSSH-style RSA public key
recp1 = ssh.Recipient.from_str("ssh-rsa ...")

# load a recipient from an OpenSSH-style Ed25519 public key
recp2 = ssh.Recipient.from_str("ssh-ed25519 ...")

# load a recipient from an age v1 x25519 public key
recp3 = x25519.Recipient.from_str("age1...")

# encrypt to all three recipients
encrypted = encrypt(
    b"the british have stopped making mistakes.", [recp1, recp2, recp3]
)

Finally, here’s two users doing encryption and decryption with a shared password:

1
2
3
4
5
6
7
8
from pyrage import passphrase

# encrypt the cleartext with password "r4m1us"
cleartext = b"engage the silent drive"
encrypted = passphrase.encrypt(cleartext, "r4m1us")
decrypted = passphrase.decrypt(encrypted, "r4m1us")

assert cleartext == decrypted

Wrapup

At the moment, the latest version of pyrage published on PyPI is a release candidate. I plan on doing a full 1.0.0 “stable” release after a few small changes, to whit:

Besides that, the API is stable and the package is ready to use.

Overall, this was a pretty easy and pleasant set of wrappers to write. The only real hiccup was with the Python-side polymorphism, corresponding to the Recipient and Identity traits in rage.

In turn, the only reason that was hard was because of Rust’s third-party trait restrictions, which composed with the lack of newtype trait projection9 to make conversion into the supertrait require a bunch of macro ugliness. It’s not trivial, but the Rust compiler could improve the experience here in a number of ways: allowing inner trait implementations to “puncture” the newtype via an explicit derive or other syntax, allowing third-party trait on third-party type implementations in a limited set of cases that don’t violate coherency10, and providing more automatic boilerplate for the “newtrait” pattern11.


  1. Specifically, it defines two extra recipient types: ssh-rsa for RSA and ssh-ed25519 for Ed25519. 

  2. Including, nicely, the CLI: rage can be used the same as the Go reference implementation’s age CLI. 

  3. My Ruby skills continue to atrophy pitifully. 

  4. For two reasons: it reduces the likelihood of an accidental timing oracle in the Python code, and it makes future updates and maintenance easier. 

  5. SSH recipients and identities can’t be created from scratch, only loaded from existing material (consistent with what rage itself supports). 

  6. Among others, like plugins. But we just aren’t going to support those in pyrage

  7. In particular: pyO3 needs to be able to apply its #[pyclass] proc macro to the type, which it can only do for first-party types. This in turn excludes types like age::ssh::Recipient, since they’re third party types in the context of the pyrage crate. 

  8. This is a good example of Rust having nice error messages, even when the failure cause is complex: in this case, pyO3 knows how to create a Vec<T> (it’s just a list), but only if every member of that list object is T: FromPyObject

  9. Also known as “generalized newtype deriving,” presumably in reference to Haskell’s GeneralizedNewtypeDeriving extension

  10. In particular, I’m pretty sure you could solve this in at least two ways: (1) allow “first-come-first-serve” trait implementations, meaning that the current “top” crate is given priority, or (2) allow for third-party traits on third-party types only when the “top” crate is a “leaf,” i.e. an executable build. The first solution isn’t ideal (it violates the referential transparency of dependencies), but I think the second is okay. 

  11. i.e., trait Foo: ThirdPartyTrait {} with no meaningful body. I just made this name up, there’s probably another phrase for this. 


Discussions: Reddit Twitter