Apr 30, 2022 Tags: llvm, rust Series: llvm-internals
In the interest of keeping my minimum once-a-month blog post streak going, this is just a brief update post on the status of mollusc. I’ll divide it up into two aspects: internal refactoring aimed at making development (especially external contribution!) simpler, and feature changes that bring the project closer to meaningfully parsing and modeling LLVM IR.
The last update was in November, and a good deal of refactoring has happened since then:
The “unrolled” APIs have been simplified and renamed (removing the Unrolled
prefix). In particular, APIs like UnrolledBlock::records()
(to get a slice
of the block’s records) have been replaced with like-named fields
(Block::records
) that return a borrowing view with fluent APIs.
For example, the following:
1
2
3
4
5
let _comdats = block
.records(ModuleCode::Comdat as u64)
.map(|rec| Comdat::try_map(rec, ctx))
.collect::<Result<Vec<_>, _>>()
.map_err(RecordMapError::from)?;
is now:
1
2
3
4
5
ctx.comdats = block
.records
.by_code(ModuleCode::Comdat)
.map(|rec| Comdat::try_from(rec))
.collect::<Result<Vec<_>, _>>()?;
The core “mapping” trait (Mappable
) was split into two distinct traits
(CtxMappable
and PartialCtxMappable
), and then removed entirely. It turns
out that my mental model of “mapping” wasn’t a very good one: it’s really just
a TryFrom
operation, so reusing that core trait makes for better code.
Removing those traits in turn eliminated the need for an IrBlock: Mappable
trait,
meaning that code like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
impl IrBlock for Strtab {
type Error = StrtabError;
const BLOCK_ID: IrBlockId = IrBlockId::Strtab;
fn try_map_inner(block: &UnrolledBlock, _ctx: &mut PartialMapCtx) -> Result<Self, Self::Error> {
let strtab = block
.records()
.one(StrtabCode::Blob as u64)
.ok_or(StrtabError::MissingBlob)
.and_then(|r| r.try_blob(0).map_err(StrtabError::from))?;
Ok(Self(strtab))
}
}
is now this:
1
2
3
4
5
6
7
8
9
10
11
12
13
impl TryFrom<&'_ Block> for Strtab {
type Error = StrtabError;
fn try_from(block: &'_ Block) -> Result<Self, Self::Error> {
let strtab = block
.records
.one(StrtabCode::Blob as u64)
.ok_or(StrtabError::MissingBlob)
.and_then(|r| r.try_blob(0).map_err(StrtabError::from))?;
Ok(Self(strtab))
}
}
This has the additional advantage of erasing unused PartialMapCtx
parameters
from contexts where they’re not necessary.
(Technically this hasn’t landed yet, but it will with #25).
Very preliminary support for functions has begun. Reassembling
LLVM IR functions from bitcode is complicated, and involves referencing
multiple pieces of state: MODULE_CODE_FUNCTION
records for function names
and signatures, FUNCTION_BLOCK
blocks for function bodies themselves
(including basic blocks and constituent instructions), as well as various
value symbol tables and other pieces of sidecar state. Support for
MODULE_CODE_FUNCTION
is (mostly) complete as of
#14 and
#22; support
for the bodies themselves is continuing with
#25.
#23: Alias records
(MODULE_CODE_ALIAS
) are now supported. These correspond to
LLVM’s notion of aliases,
i.e. alternate names for functions, globals, or other nameable values.
More “support” APIs have been filled in: the Align
and Type
models
have been given more functionality that roughly mirrors what’s available
in LLVM, such as Type::scalar_type
for getting a potentially compound
type’s “interior” type.
Like I said, this is meant to be a very short post. You probably didn’t get a lot of value out of it, and that’s okay! More will be coming soon.
In other news: LLVM merged my changes in
D108438, which make it much easier to dump
the AST within LLVM’s LLVMBitCodes.h
header. That, in turn, will make it
possible for me to write some awful magic code generation scripts that
should largely automate the process of keeping mollusc
’s core enums and
constants in sync with LLVM.