Jun 12, 2016 Tags: gsoc, programming, ruby, ruby-macho
Quick background: These posts are part of my ongoing work for Homebrew during
Google Summer of Code. It
pure-Ruby Mach-O parsing library I began
last year as an eventual replacement to tools like
install_name_tool, which make Homebrew depend on Xcode for many tasks not
directly related to compilation. The ultimate goal of this work is to eliminate
Homebrew’s dependency on Xcode for bottle (binary) installs, making the average
Homebrew install much lighter and more independent of Xcode (and OS X as a
As of these past two weeks:
Support for PPC binary parsing is (almost) here!
ruby-macho with only x86-family (and little-endian) Mach-O binaries
in mind, but Homebrew still has plenty of Universal builds in its formulae
(and probably even a few people still on PPC systems). As such, this has
been a long time coming.
Since Mach-O is a dual-endian format, the changes required for bringing in
PPC support were mostly a matter of replacing the hard-coded string packing
and unpacking system with conditions backed by a new
endianness, populated in
MachOFile#get_and_check_magic. For example,
1 cmd = @raw_data.slice(offset, 4).unpack("L").first
1 2 fmt = (endianness == :little) ? "L<" : "L>" cmd = @raw_data.slice(offset, 4).unpack(fmt).first
A similar logic is applied to classes that derive from
with a new
MachOStructure.specialize_format picking up the slack.
The old system relied on the fact that
L is the specifier for a
little-endian unsigned long (dword, uint32, etc…), while the new one uses
> little and big-endianness specifiers introduced in Ruby 1.9.
This breaks compatibility with 1.8.7, Homebrew’s earliest supported
Ruby version, but that hopefully won’t be the case for
Now that the boilerplate for parsing big-endian Mach-Os is laid down, adding
support for a more diverse range of architectures (ARM, SPARC, m68/88k)
should be just a matter of adding the appropriate
CPU types and subtypes are now handled much better!
Previously, CPU types and subtypes were grouped into two big arrays
CPU_SUBTYPES). This wasn’t ideal, as
several subtypes have overlapping values (for example,
CPU_SUBTYPE_POWERPC_750 are both 9).
this has been corrected by replacing the
CPU_SUBTYPES array with a hash
table that descends to easy-to-remember symbols (instead of string
representations of the constant names). The API was also adjusted to reflect
The test suite is now clean(er)!
Previously, the binaries in the test suite were all tossed into one big directory (test/bin/), with no structure whatsoever (besides a ‘fat’ somewhere in their filename if they were universal). This wasn’t going to cut it for expanding the test suite beyond very basic thin and fat Mach-O tests, so some restructuring was necessary.
Single-architecture binaries are now stored under test/bin/arch, where
arch is an architecture name listed in
binaries are stored under test/bin/fat-__arch1-__arch2-…-archN,
where each of arch1…archN is an architecture name (also from
To compensate for this added complexity in directory structure, a
Helpers.fixture method was added to generate paths to binaries of a given
list of architectures. For example:
1 2 3 4 fixture(:i386, "hello.bin") # => "test/bin/i386/hello.bin" fixture([:i386, :x86_64], "hello.bin") # => "test/bin/fat-i386-x86_64/hello.bin"
This obviated the need for the previous collection of
pointing to binaries.
Many thanks to Martin Afanasjew (UniqMartin) for continuing to mentor me (and for reminding me to make this post).
The changes described above can be seen in PRs #24, #25, and #26, in that order.
Thanks for reading!