Aug 16, 2020 Tags: devblog, programming, rust, x86
This is a quick writeup of a steganographic tool that I threw together: steg86. It uses a quirk of the x86 instruction encoding format to hide messages in pre-existing binaries with no size or performance overhead.
You can play with it yourself by fetching it with
1 2 $ cargo install steg86 $ steg86 --help
Steganography is the practice of concealing data within other data. A variety of different common formats open themselves up to steganographic channels:
On its own, steganography is not a replacement for cryptography: it’s strictly obfuscatory, and does not provide any deniablity of the embedded message if revealed.
That being said, there exist a variety of use cases for steganographic schemes:
There are two obvious avenues for doing steganography on compiled programs. steg86 does neither of these, but for the sake of completeness:
Compilers perform instruction selection to lower their internal representations into machine code. Instruction selection can be informed by any number of weights, including register pressure, individual instruction performance (clock cycles, use of shared units), and feature availability (e.g., selecting an older version of SSE to be compatible with more CPUs).
In this approach, the steganographer would control the compiler and its instruction selection weights, encoding information in particular selections of instructions (or other parts of code generation, like stack object order).
This affords a great deal of flexibility and control, but also makes it more difficult for the compiler to perform optimal selections. It also requires the steganographer to maintain their (forked) compiler. Not a bad idea, but requires a decent amount of effort.
Another approach is to ignore the compiled instructions themselves, and focus on the binary container: ELF, Mach-O, or PE, among others.
These formats afford many opportunities for embedding information: the order (both file and virtual address) of segments and sections, the order and layout of relocation tables and the composition of their internal rules, and so on.
Instrumenting the binary format rather than the executable code means not having to muck with instruction selection, but is also less bountiful in terms of information density. It’s also not portable, e.g. tricks for hiding information in ELF relocation entries would have to be tweaked for PEs.
Instead of controlling instruction selection or tweaking the details of the containing binary format, steg86 takes a third route: it uses the presence of semantic duals in the x86 and AMD64 instruction formats to selectively rewrite a program after compilation.
Like every ISA, x86 (and AMD64) have multiple ways to encode the semantics of a particular (higher level, conceptual) operation. For example, clearing a register:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 ; xor'ing a register with itself clears all bits xor eax, eax ; and'ing a register with 0 also clears all bits and eax, 0 ; we could also set 0 mov eax, 0 ; ...or subtract sub eax, eax ; ...or load a "computed" address that's always 0 lea eax, 
…among many others. Unfortunately, this doesn’t work exactly as we’d like:
xor eax, eaxversus 6 for
lea eax, . Consequently, rewriting one in terms of another might entail a length change and subsequent relocation and displacement fixups.
Fortunately, this is not what steg86 does.
Instead, steg86 takes advantage of one of the quirks of the x86 instruction encoding: the ModR/M byte:
The ModR/M byte is normally used to encode one or more explicit register and/or memory operands to an opcode1: it’s how x86 can supply the rich range of source-to-sink modes that it has (register-to-memory, memory-to-register, register-to-register, immediate-to-register, &c).
Because register-to-memory and memory-to-register are both valid source-sink combinations for many opcodes, many have encoding forms like the following:
1 2 3 4 5 ; xor register with register/memory32 and store register/memory32 xor r/m32, r32 ; xor register/memory32 with register and store register xor r32, r/m32
Consequently, there are actually two ways to encode
xor eax, eax:
1 2 3 4 5 ; r/m32, r 31 C0 ; r, r/m32 33 C0
These encodings are semantic duals: they have different opcodes, but belong to the same
instruction family, have the same length, and have the exact same behavior and performance.
We could replace every occurrence of
31 C0 with
33 C0 and our programs would happily chug along,
none the wiser.
As it turns out, there are actually quite a few of these duals on some of the most common x86 instructions2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 ADD r/m8, r8 <=> ADD r8, r/m8 ADD r/m16, r16 <=> ADD r16, r/m16 ADD r/m32, r32 <=> ADD r32, r/m32 ADD r/m64, r64 <=> ADD r64, r/m64 ADC r/m8, r8 <=> ADC r8, r/m8 ADC r/m16, r16 <=> ADC r16, r/m16 ADC r/m32, r32 <=> ADC r32, r/m32 ADC r/m64, r64 <=> ADC r64, r/m64 AND r/m8, r8 <=> AND r8, r/m8 AND r/m16, r16 <=> AND r16, r/m16 AND r/m32, r32 <=> AND r32, r/m32 AND r/m64, r64 <=> AND r64, r/m64 OR r/m8, r8 <=> OR r8, r/m8 OR r/m16, r16 <=> OR r16, r/m16 OR r/m32, r32 <=> OR r32, r/m32 OR r/m64, r64 <=> OR r64, r/m64 XOR r/m8, r8 <=> XOR r8, r/m8 XOR r/m16, r16 <=> XOR r16, r/m16 XOR r/m32, r32 <=> XOR r32, r/m32 XOR r/m64, r64 <=> XOR r64, r/m64 SUB r/m8, r8 <=> SUB r8, r/m8 SUB r/m16, r16 <=> SUB r16, r/m16 SUB r/m32, r32 <=> SUB r32, r/m32 SUB r/m64, r64 <=> SUB r64, r/m64 SBB r/m8, r8 <=> SBB r8, r/m8 SBB r/m16, r16 <=> SBB r16, r/m16 SBB r/m32, r32 <=> SBB r32, r/m32 SBB r/m64, r64 <=> SBB r64, r/m64 MOV r/m8, r8 <=> MOV r8, r/m8 MOV r/m16, r16 <=> MOV r16, r/m16 MOV r/m32, r32 <=> MOV r32, r/m32 MOV r/m64, r64 <=> MOV r64, r/m64 CMP r/m8, r8 <=> CMP r8, r/m8 CMP r/m16, r16 <=> CMP r16, r/m16 CMP r/m32, r32 <=> CMP r32, r/m32 CMP r/m64, r64 <=> CMP r64, r/m64
Each dual represents one bit of information: we can arbitrarily assign one form
true and its counterpart to be
false. Given enough of them in a target
program, we can hide a message by translating between the forms. The result:
a binary that’s exactly the same size as the original and with the same performance
characteristics, but with a hidden message.
That’s all steg86 does: it locates every semantic dual in its target program, maps
its input message to a bitstring, and translates each dual depending on its corresponding
message bit. Because it only uses duals, it’s completely agnostic to its containing format:
steg86 CLI currently supports ELF, Mach-O, and PE/PE32+ binaries.
Here’s what that actually looks like, with (some of the) differing bytes highlighted in red:
And, of course, try it for yourself:
1 2 3 4 5 6 7 8 9 10 11 12 $ steg86 profile /bin/bash Summary for /bin/bash: 175828 total instructions 27957 potential semantic pairs 27925 bits of information capacity (3490 bytes, approx. 3KB) $ steg86 embed /bin/bash test.steg <<< "hello world!" $ file test.steg test.steg: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a6cb40078351e05121d46daa768e271846d5cc54, for GNU/Linux 3.2.0, stripped $ steg86 extract test.steg hello world!
See the README for prior work and future refinements to steg86’s informational capacity.
Edit: An important note: because compilers and assemblers tend to select just one of the forms above, steg86 cannot provide deniability, even with strong encryption. Its transformations will always look abnormal compared to a compiler-produced binary.