Programming, philosophy, pedaling.

State of ruby-macho, GSoC 2016 Wrap-Up Edition

Aug 20, 2016     Tags: gsoc, programming, ruby, ruby-macho    

This post is at least a year old.

Updated 2016-08-23 to pick up recent changes and clarify/expand sections.

This post is a wrap-up of my Google Summer of Code 2016 work for Homebrew, particularly the improvements made to ruby-macho and their ultimate impact upon Homebrew users.

In chronological order, here are my previous posts from this year’s GSoC. They aren’t essential to this post, but each provides more specific technical details and difficulties that may be interesting:

Core Changes

Support for Parsing Big-Endian Binaries


Despite the fact that OS X has shipped on little-endian (x86) architectures exclusively for nearly a decade, many fat OS X binaries still contain big-endian (PPC) slices. Similarly, Homebrew is still in active use on older (i.e, earlier than 10.7) versions of OS X that include PPC slices in system libraries and executables.

Ruby-macho was initially written with only x86-family binaries in mind. In particular, the parser relied heavily on a number of hard-coded (little-endian) assumptions about the format of binary data extracted from the file buffer. To make ruby-macho endian-independent, these assumptions were replaced with a generation system that emits the correct formatting variable based on an endianness variable determined early in the parsing process. The product is a significantly cleaner (and more generic) parser that gives us more flexibility with respect to both Homebrew’s Mach-O needs and future library features not necessarily required by brew.


RPath Inspection and Modification


Ruby-macho has supported inspecting and changing dynamic linkages (i.e., LC_ID_DYLIB and all other flavors of dylib_cmd) since last year, but support for RPath modification was conspicuously incomplete.

Since RPath modification is one of install_name_tool’s major functionalities, ruby-macho needed to implement all forms of RPath addition, deletion, and mutation in order to be a viable alternative.


Load Command Creation and Serialization


One of the earliest restrictions of ruby-macho was that (valid) instances of LoadCommand objects could only be created by the parser from a binary string of Mach-O data. In turn, this restricted the mechanism and interface through which load commands were modified. Similarly, it prevented the generic addition of new load commands to a file.


Homebrew Integration and Deployment


Ruby-macho had been vendored into Homebrew earlier, but the feature additions and API changes required a revendoring of the updated library (and the glue that connects ruby-macho to brew’s relocation code). Additionally, this allowed us to smoke test the parser on CI builds, which resulted in the discovery of bugs and unusual edge behavior that would have otherwise inconvenienced the user.


Lesser Changes

Summary of Work


Project Goals

Thanks for reading!

- William


  1. Benchmarking was performed throughout development. No individual changes caused significant increases or decreases in ruby-macho’s performance.