Aug 20, 2016 Tags: gsoc, programming, ruby, ruby-macho
Updated 2016-08-23 to pick up recent changes and clarify/expand sections.
This post is a wrap-up of my Google Summer of Code 2016 work for Homebrew, particularly the improvements made to ruby-macho and their ultimate impact upon Homebrew users.
In chronological order, here are my previous posts from this year’s GSoC. They aren’t essential to this post, but each provides more specific technical details and difficulties that may be interesting:
Despite the fact that OS X has shipped on little-endian (x86) architectures exclusively for nearly a decade, many fat OS X binaries still contain big-endian (PPC) slices. Similarly, Homebrew is still in active use on older (i.e, earlier than 10.7) versions of OS X that include PPC slices in system libraries and executables.
Ruby-macho was initially written with only x86-family binaries in mind.
In particular, the parser relied heavily on a number of hard-coded
(little-endian) assumptions about the format of binary data extracted from the
file buffer. To make ruby-macho endian-independent, these assumptions were
replaced with a generation system that emits the correct formatting variable
based on an endianness variable determined early in the parsing process. The
product is a significantly cleaner (and more generic) parser that gives us
more flexibility with respect to both Homebrew’s Mach-O needs and future library
features not necessarily required by brew
.
Support for big-endian parsing was added with ruby-macho/24.
Improved CPU type/subtype handling was added with ruby-macho/25.
Ruby-macho has supported inspecting and changing dynamic linkages (i.e.,
LC_ID_DYLIB
and all other flavors of dylib_cmd
) since last year, but support
for RPath modification was conspicuously incomplete.
Since RPath modification is one of install_name_tool
’s major functionalities,
ruby-macho needed to implement all forms of RPath addition, deletion, and
mutation in order to be a viable alternative.
Initial RPath inspection and modification was added with ruby-macho/35 and ruby-macho/39.
Initial RPath deletion was added with ruby-macho/40.
Initial RPath addition was added with ruby-macho/44.
Full (and generic) addition and deletion was added with ruby-macho/45.
RPath duplication protection was added with ruby-macho/49.
One of the earliest restrictions of ruby-macho was that (valid) instances of
LoadCommand
objects could only be created by the parser from a binary string
of Mach-O data. In turn, this restricted the mechanism and interface through
which load commands were modified. Similarly, it prevented the generic addition
of new load commands to a file.
LoadCommand.create
and serialization via
LoadCommand#serialize
was added with
ruby-macho/38.Ruby-macho had been vendored into Homebrew earlier, but the feature additions
and API changes required a revendoring of the updated library (and the glue
that connects ruby-macho to brew
’s relocation code). Additionally, this
allowed us to smoke test the parser on CI builds, which resulted in the
discovery of bugs and unusual edge behavior that would have otherwise
inconvenienced the user.
Vendored ruby-macho updated to 0.2.4 (and enabled on the CI) with brew/378.
Homebrew’s Mach-O enumeration logic fixed with brew/400.
Vendored ruby-macho updated to 0.2.5 with brew/656.
Ruby-macho enabled for all users with brew/767.
Complete documentation coverage was achieved:
Unit test coverage was improved and binary fixtures were categorized:
ruby-macho’s API was made significantly more uniform and idiomatic:
Feature compatibility with libmacho
was improved:
The internal codebase was cleaned up:
34 PRs opened (30 on Homebrew/ruby-macho, 4 on Homebrew/brew)
2 PRs reviewed (both on Homebrew/ruby-macho)
6 issues opened (5 on Homebrew/ruby-macho, 1 on pypa/virtualenv)
4 issues reviewed (all 4 on Homebrew/ruby-macho)
Commit ranges for each repository:
Thanks for reading!
- William
Benchmarking was performed throughout development. No individual changes caused significant increases or decreases in ruby-macho’s performance. ↩