Aug 2, 2020 Tags: devblog, programming, python, rust
This post is a quick walkthrough of how I wrote a Python library,
procmaps, in nothing but Rust. It uses
PyO3 for the bindings and
maturin to manage the build (as well as produce
manylinux1
-compatible wheels).
The code is, of course, available on GitHub, and
can be installed directly with a modern Python (3.5+) via pip
1 without a local Rust install:
1
$ pip3 install procmaps
procmaps is an extremely small Python library, backed by a similarly small Rust library2.
All it does is parse “maps” files, best known for their presence under
procfs
on Linux3, into
a list of Map
objects. Each Map
, in turn, contains the basic attributes of the
mapped memory region.
By their Python attributes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import os
import procmaps
# also: from_path, from_str
# N.B.: named map_ instead of map to avoid shadowing the map function
map_ = procmaps.from_pid(os.getpid())[0]
map_.begin_address # the begin address for the mapped region
map_.end_address # the end address for the mapped region
map_.is_readable # is the mapped region readable?
map_.is_writable # is the mapped region writable?
map_.is_executable # is the mapped region executable?
map_.is_shared # is the mapped region shared with other processes?
map_.is_private # is the mapped region private (i.e., copy-on-write)?
map_.offset # the offset into the region's source that the region originates from
map_.device # a tuple of (major, minor) for the device that the region's source is on
map_.inode # the inode of the source for the region
map_.pathname # the "pathname" field for the region, or None if an anonymous map
Critically: apart from the import
s and the os.getpid()
call, all of the code above
calls directly into compiled Rust.
The motivations behind procmaps are twofold.
First: I do program analysis and instrumentation research at my day job. Time and time again,
I need to obtain information about the memory layout of a program that I’m instrumenting (or would
like to instrument). This almost always means opening /proc/<pid>/maps
, writing an ad-hoc parser,
getting the field(s) I want, and then getting on with my life.
Doing this over and over again has made me realize that it’s an ideal task for a small, self-contained Rust library:
Second: I started learning Rust about a year ago, and have been looking for new challenges in it. Interoperating with another language (especially one with radically different memory semantics, like Python) is an obvious choice.
The procmaps module is a plain old Rust crate. Really.
The only differences are in the Cargo.toml:
1
2
3
4
5
6
7
8
[lib]
crate-type = ["cdylib"]
[package.metadata.maturin]
classifier = [
"Programming Language :: Rust",
"Operating System :: POSIX :: Linux",
]
(Other settings under package.metadata.maturin
are available for e.g. managing Python-side dependencies,
but procmaps doesn’t need them. More details are available
here.)
In terms of code, the crate is structured like a normal Rust library. PyO3 only requires a few pieces of sugar to promote everything into Python-land:
Python modules are created by decorating a Rust function with #[pymodule]
.
This function then uses the functions of the PyModule
argument that it takes to load the module’s
functions and classes.
For example, here is the Python-visible procmaps
module in its entirety:
1
2
3
4
5
6
7
8
9
#[pymodule]
fn procmaps(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_class::<Map>()?;
m.add_wrapped(wrap_pyfunction!(from_pid))?;
m.add_wrapped(wrap_pyfunction!(from_path))?;
m.add_wrapped(wrap_pyfunction!(from_str))?;
Ok(())
}
Module level functions are trivial to create: they’re just normal Rust functions,
marked with #[pyfunction]
. They’re loaded into modules via add_wrapped
+ wrap_pyfunction!
,
as seen above. Alternatively, they can be created within a module definition (i.e., nested within
the #[pymodule]
) function via the #[pyfn]
decorator.
Python-visible functions return a PyResult<T>
, where T
implements IntoPy<PyObject>
.
PyO3 helpfully provides an implementation of this trait for many core types; a full table is
here.
This includes Option<T>
, making it painless to turn Rust-level functions that return
Option
s into Python-level functions that can return None
.
procmaps doesn’t make use of them, but PyO3 also supports variadic arguments and keyword arguments. Details on those are available here.
Here’s a trivial Python-exposed function that does integer division, returning None
if
division by zero is requested:
1
2
3
4
5
6
7
8
#[pyfunction]
fn idiv(dividend: i64, divisor: i64) -> PyResult<Option<i64>> {
if divisor == 0 {
Ok(None)
} else {
Ok(Some(dividend / divisor))
}
}
Classes are loaded into modules via the add_class
function, as seen in the module definition.
Just like modules, they’re managed almost entirely behind a single decorator, this time on a
Rust struct. Here is the entirety of the procmaps.Map
class definition:
1
2
3
4
#[pyclass]
struct Map {
inner: rsprocmaps::Map,
}
procmaps doesn’t need them, but trivial getters and setters can be added to the members of a
class with #[pyo3(get, set)]
. For example, the following creates a Point
class:
1
2
3
4
5
6
7
#[pyclass]
struct Point {
#[pyo3(get, set)]
x: i64,
#[pyo3(get, set)]
y: i64,
}
…for which the following would be possible in Python:
1
2
3
4
5
6
7
8
9
# get_unit_point not shown above
from pointlib import get_unit_point
p = get_unit_point()
print(p.x, p.y)
p.x = 100
p.y = -p.x
print(p.x, p.y)
Using #[pyclass]
on Foo
auto-implements IntoPy<PyObject> for Foo
, making it
easy to return your custom classes from any function (as above) or member method
(as below).
Just as Python-visible classes are defined via #[pyclass]
on Rust struct
s,
Python-visible member methods are declared via #[pymethods]
attribute on Rust impl
s for
those structures.
Member methods return PyResult<T>
, just like functions do:
1
2
3
4
5
6
#[pymethods]
impl Point {
fn invert(&self) -> PyResult<Point> {
Ok(Point { x: self.y, y: self.x})
}
}
…allows for the following:
1
2
3
4
5
# get_unit_point not shown above
from pointlib import get_unit_point
p = get_unit_point()
p_inv = p.invert()
By default, PyO3 forbids the creation of Rust-defined classes within Python code. To allow their
creation, just add a function with the #[new]
attribute to the #[pymethods]
impl
block.
This creates a __new__
Python method rather than __init__
; PyO3 doesn’t support the latter5.
For example, here’s a constructor for the contrived Point
class above:
1
2
3
4
5
6
7
#[pymethods]
impl Point {
#[new]
fn new(x: i64, y: i64) -> Self {
Point { x, y }
}
}
…which allows for:
1
2
3
4
5
from pointlib import Point
p = Point(100, 0)
p_inv = p.invert()
assert p.y == 100
As mentioned above, (most) Python-visible functions and methods return PyResult<T>
.
The Err
half of PyResult
is PyErr
, and these values get propagated as Python exceptions.
The pyo3::exceptions
module contains structures that parallel the standard Python exceptions,
each of which provides a py_err(String)
function to produce an appropriate PyErr
.
Creating a brand new Python-level exception takes a single line with the create_exception!
macro.
Here’s how procmaps creates a procmaps.ParseError
exception that inherits from the standard
Python Exception
class:
1
2
3
4
5
use pyo3::exceptions::Exception;
// N.B.: The first argument is the module name,
// i.e. the function declared with #[pymodule].
create_exception!(procmaps, ParseError, Exception);
Similarly, marshalling Rust Error
types into PyErr
s is as simple as
impl std::convert::From<ErrorType> for PyErr
.
Here’s how procmaps turns some of its errors into standard Python IOError
s
and others into the custom procmaps.ParseError
exception:
1
2
3
4
5
6
7
8
9
10
11
12
// N.B.: The newtype here is only necessary because Error comes from an
// external crate (rsprocmaps).
struct ProcmapsError(Error);
impl std::convert::From<ProcmapsError> for PyErr {
fn from(err: ProcmapsError) -> PyErr {
match err.0 {
Error::Io(e) => IOError::py_err(e.to_string()),
Error::ParseError(e) => ParseError::py_err(e.to_string()),
Error::WidthError(e) => ParseError::py_err(e.to_string()),
}
}
}
With everything above, cargo build
just works — it produces a Python-loadable
shared object.
Unfortunately, it does it using the cdylib
naming convention, meaning that cargo build
for
procmaps produces libprocmaps.so
, rather than one of the naming conventions that Python knows
how to look for when searching $PYTHONPATH
6.
This is where maturin comes in: once installed, a single
maturin build
in the crate root puts an appropriately named pip
-compatible
wheel in target/wheels
.
It gets even better: maturin develop
will install the compiled module directly into the current
virtual environment, making local development as simple as:
1
2
3
4
5
6
$ python3 -m venv env
$ source env/bin/activate
(env) $ pip3 install maturin
(env) $ maturin develop
$ python3
> import procmaps
procmaps has a handy Makefile
that wraps all of that; running the compiled module locally is a single make develop
away.
Distribution is slightly more involved: maturin develop
builds wheels that are compatible with the
local machine, but further restrictions on symbol versions and linkages are required to ensure
that a binary wheel runs on a large variety of Linux versions and distributions7.
Compliance with these constraints is normally enforced in one of two ways:
Distribution with maturin
takes the latter approach: the maturin
developers have derived a
Rust build container from the PyPa’s standard manylinux
container, making fully compatible
builds (again, from the crate root) as simple as:
1
2
# optional: do `build --release` for release-optimized builds
$ docker run --rm -v $(pwd):/io konstin2/maturin build
This command, like a normal maturin build
, drops the compiled wheel(s) into target/wheels
.
Because it runs inside of the standard manylinux
container, it can and does automatically build
wheels for a wide variety of Python versions (Python 3.5 through 3.8, as of writing).
From here, distribution to PyPI is as simple as twine upload target/wheels/*
or
maturin publish
. procmaps currently uses the former, as releases are
handled via GitHub Actions
using the PyPA’s excellent
gh-action-pypi-publish
action.
Voilá: a Python module, written completely in Rust, that can be installed on the vast majority of
Linux distributions with absolutely no dependencies on Rust itself. Even the non-maturin
metadata in Cargo.toml
is propagated correctly!
I only ran into one small hiccup while working on procmaps — I tried to add a
Map.__contains__
method to allow for inclusion checks with the in
protocol, e.g.:
1
2
3
fn __contains__(&self, addr: u64) -> PyResult<bool> {
Ok(addr >= self.inner.address_range.begin && addr < self.inner.address_range.end)
}
…but this didn’t work, for whatever reason, despite working when called manually:
1
2
3
4
5
6
7
>>> 4194304 in map_
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument of type 'Map' is not iterable
>>> map_.__contains__(4194304)
True
There’s probably a reasonable explanation for this in the Python data model that I haven’t figured
out.
Edit:
a Redditor pointed me to the correct approach.
I’ve cut a new release of procmaps that shows the __contains__
protocol in action.
By and large, the process of writing a Python module in Rust was extremely pleasant —
I didn’t have to write a single line of Python (or even Python-specific configuration) until I wanted
to add unit tests. Both pyO3 and maturin are incredibly polished, and the PyPA’s efforts
to provide manylinux
build environments made compatible builds a breeze.
…on x86_64 only, for the time being. There’s nothing fundamentally blocking other architectures; it’s just a matter of hooking them up via a CI other than GitHub Actions. ↩
My original goal with the Rust library was to teach myself Pest on a simple format. It turns out that there is already a high quality equivalent package available on Crates. ↩
Linux didn’t originate procfs
but, as far as I can tell, no other Unices provide /proc/<pid>/maps
. FreeBSD appears to provide a /proc/<pid>/map
file of similar purpose. ↩
Except in the “pathname” field; see the proc(5)
manpage for details. ↩
Presumably because Rust has no concept of a “created but uninitialized” object; the two are always conjoined. ↩
Documentation for these is a little scarce, but strace -m procmaps
indicates that the acceptable formats are procmaps.cpython-XX-target-triple.so
, procmaps.so
, and procmapsmodule.so
. ↩
These are known as the “manylinux” constraints, and are documented in PEPs 513 (“manylinux1”), 571 (“manylinux2010”), 599 (“manylinux2014”), 600, and possibly others. ↩