Jun 12, 2016 Tags: gsoc, programming, ruby, ruby-macho
Quick background: These posts are part of my ongoing work for Homebrew during
this year’s
Google Summer of Code. It
revolves around ruby-macho
, a
pure-Ruby Mach-O parsing library I began
last year as an eventual replacement to tools like otool
and
install_name_tool
, which make Homebrew depend on Xcode for many tasks not
directly related to compilation. The ultimate goal of this work is to eliminate
Homebrew’s dependency on Xcode for bottle (binary) installs, making the average
Homebrew install much lighter and more independent of Xcode (and OS X as a
whole).
As of these past two weeks:
Support for PPC binary parsing is (almost) here!
I wrote ruby-macho
with only x86-family (and little-endian) Mach-O binaries
in mind, but Homebrew still has plenty of Universal builds in its formulae
(and probably even a few people still on PPC systems). As such, this has
been a long time coming.
Since Mach-O is a dual-endian format, the changes required for bringing in
PPC support were mostly a matter of replacing the hard-coded string packing
and unpacking system with conditions backed by a new MachOFile
attribute:
endianness
, populated in MachOFile#get_and_check_magic
. For example,
lines like
1
cmd = @raw_data.slice(offset, 4).unpack("L").first
became
1
2
fmt = (endianness == :little) ? "L<" : "L>"
cmd = @raw_data.slice(offset, 4).unpack(fmt).first
A similar logic is applied to classes that derive from MachOStructure
,
with a new MachOStructure.specialize_format
picking up the slack.
The old system relied on the fact that L
is the specifier for a
little-endian unsigned long (dword, uint32, etc…), while the new one uses
the <
and >
little and big-endianness specifiers introduced in Ruby 1.9.
This breaks compatibility with 1.8.7, Homebrew’s earliest supported
Ruby version, but that hopefully won’t be the case for
much longer.
Now that the boilerplate for parsing big-endian Mach-Os is laid down, adding
support for a more diverse range of architectures (ARM, SPARC, m68/88k)
should be just a matter of adding the appropriate CPU_TYPE
and
CPU_SUBTYPE
constants!
CPU types and subtypes are now handled much better!
Previously, CPU types and subtypes were grouped into two big arrays
(creatively named CPU_TYPES
and CPU_SUBTYPES
). This wasn’t ideal, as
several subtypes have overlapping values (for example, CPU_SUBTYPE_ARM_V7
and CPU_SUBTYPE_POWERPC_750
are both 9).
As of
4a1116b,
this has been corrected by replacing the CPU_SUBTYPES
array with a hash
table that descends to easy-to-remember symbols (instead of string
representations of the constant names). The API was also adjusted to reflect
this (namely MachOFile#cputype
, MachOFile#cpusubtype
, and
FatFile#extract
).
The test suite is now clean(er)!
Previously, the binaries in the test suite were all tossed into one big directory (test/bin/), with no structure whatsoever (besides a ‘fat’ somewhere in their filename if they were universal). This wasn’t going to cut it for expanding the test suite beyond very basic thin and fat Mach-O tests, so some restructuring was necessary.
Single-architecture binaries are now stored under test/bin/arch, where
arch is an architecture name listed in arch(3)
. Multi-architecture
binaries are stored under test/bin/fat-__arch1-__arch2-…-archN,
where each of arch1…archN is an architecture name (also from
arch(3)
).
To compensate for this added complexity in directory structure, a
Helpers.fixture
method was added to generate paths to binaries of a given
list of architectures. For example:
1
2
3
4
fixture(:i386, "hello.bin")
# => "test/bin/i386/hello.bin"
fixture([:i386, :x86_64], "hello.bin")
# => "test/bin/fat-i386-x86_64/hello.bin"
This obviated the need for the previous collection of TEST_*
constants
pointing to binaries.
Many thanks to Martin Afanasjew (UniqMartin) for continuing to mentor me (and for reminding me to make this post).
The changes described above can be seen in PRs #24, #25, and #26, in that order.
Thanks for reading!
- William