Hiding messages in x86 binaries using semantic duals

This is a quick writeup of a steganographic tool that I threw together: steg86. It uses a quirk of the x86 instruction encoding format to hide messages in pre-existing binaries with no size or performance overhead.

1
2
$ cargo install steg86
$ steg86 --help

A quick steganography refresher

Steganography is the practice of concealing data within other data. A variety of different common formats open themselves up to steganographic channels:

On its own, steganography is not a replacement for cryptography: it’s strictly obfuscatory, and does not provide any deniablity of the embedded message if revealed.

Doing steganography on computer programs

There are two obvious avenues for doing steganography on compiled programs. steg86 does neither of these, but for the sake of completeness:

Instrumenting the program via the compiler

Compilers perform instruction selection to lower their internal representations into machine code. Instruction selection can be informed by any number of weights, including register pressure, individual instruction performance (clock cycles, use of shared units), and feature availability (e.g., selecting an older version of SSE to be compatible with more CPUs).

In this approach, the steganographer would control the compiler and its instruction selection weights, encoding information in particular selections of instructions (or other parts of code generation, like stack object order).

This affords a great deal of flexibility and control, but also makes it more difficult for the compiler to perform optimal selections. It also requires the steganographer to maintain their (forked) compiler. Not a bad idea, but requires a decent amount of effort.

Instrumenting the container format

Another approach is to ignore the compiled instructions themselves, and focus on the binary container: ELF, Mach-O, or PE, among others.

These formats afford many opportunities for embedding information: the order (both file and virtual address) of segments and sections, the order and layout of relocation tables and the composition of their internal rules, and so on.

Instrumenting the binary format rather than the executable code means not having to muck with instruction selection, but is also less bountiful in terms of information density. It’s also not portable, e.g. tricks for hiding information in ELF relocation entries would have to be tweaked for PEs.

How steg86 works

Instead of controlling instruction selection or tweaking the details of the containing binary format, steg86 takes a third route: it uses the presence of semantic duals in the x86 and AMD64 instruction formats to selectively rewrite a program after compilation.

Semantic duals

Like every ISA, x86 (and AMD64) have multiple ways to encode the semantics of a particular (higher level, conceptual) operation. For example, clearing a register:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
; xor'ing a register with itself clears all bits
xor eax, eax

; and'ing a register with 0 also clears all bits
and eax, 0

; we could also set 0
mov eax, 0

; ...or subtract
sub eax, eax

; ...or load a "computed" address that's always 0
lea eax, [0]

…among many others. Unfortunately, this doesn’t work exactly as we’d like:

Instead, steg86 takes advantage of one of the quirks of the x86 instruction encoding: the ModR/M byte:

The ModR/M byte is normally used to encode one or more explicit register and/or memory operands to an opcode¹: it’s how x86 can supply the rich range of source-to-sink modes that it has (register-to-memory, memory-to-register, register-to-register, immediate-to-register, &c).

Because register-to-memory and memory-to-register are both valid source-sink combinations for many opcodes, many have encoding forms like the following:

1
2
3
4
5
; xor register with register/memory32 and store register/memory32
xor r/m32, r32

; xor register/memory32 with register and store register
xor r32, r/m32

1
2
3
4
5
; r/m32, r
31 C0

; r, r/m32
33 C0

These encodings are semantic duals: they have different opcodes, but belong to the same instruction family, have the same length, and have the exact same behavior and performance. We could replace every occurrence of 31 C0 with 33 C0 and our programs would happily chug along, none the wiser.

As it turns out, there are actually quite a few of these duals on some of the most common x86 instructions²:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
ADD r/m8, r8   <=> ADD r8, r/m8
ADD r/m16, r16 <=> ADD r16, r/m16
ADD r/m32, r32 <=> ADD r32, r/m32
ADD r/m64, r64 <=> ADD r64, r/m64

ADC r/m8, r8   <=> ADC r8, r/m8
ADC r/m16, r16 <=> ADC r16, r/m16
ADC r/m32, r32 <=> ADC r32, r/m32
ADC r/m64, r64 <=> ADC r64, r/m64

AND r/m8, r8   <=> AND r8, r/m8
AND r/m16, r16 <=> AND r16, r/m16
AND r/m32, r32 <=> AND r32, r/m32
AND r/m64, r64 <=> AND r64, r/m64

OR r/m8, r8    <=> OR r8, r/m8
OR r/m16, r16  <=> OR r16, r/m16
OR r/m32, r32  <=> OR r32, r/m32
OR r/m64, r64  <=> OR r64, r/m64

XOR r/m8, r8   <=> XOR r8, r/m8
XOR r/m16, r16 <=> XOR r16, r/m16
XOR r/m32, r32 <=> XOR r32, r/m32
XOR r/m64, r64 <=> XOR r64, r/m64

SUB r/m8, r8   <=> SUB r8, r/m8
SUB r/m16, r16 <=> SUB r16, r/m16
SUB r/m32, r32 <=> SUB r32, r/m32
SUB r/m64, r64 <=> SUB r64, r/m64

SBB r/m8, r8   <=> SBB r8, r/m8
SBB r/m16, r16 <=> SBB r16, r/m16
SBB r/m32, r32 <=> SBB r32, r/m32
SBB r/m64, r64 <=> SBB r64, r/m64

MOV r/m8, r8   <=> MOV r8, r/m8
MOV r/m16, r16 <=> MOV r16, r/m16
MOV r/m32, r32 <=> MOV r32, r/m32
MOV r/m64, r64 <=> MOV r64, r/m64

CMP r/m8, r8   <=> CMP r8, r/m8
CMP r/m16, r16 <=> CMP r16, r/m16
CMP r/m32, r32 <=> CMP r32, r/m32
CMP r/m64, r64 <=> CMP r64, r/m64

Each dual represents one bit of information: we can arbitrarily assign one form to be true and its counterpart to be false. Given enough of them in a target program, we can hide a message by translating between the forms. The result: a binary that’s exactly the same size as the original and with the same performance characteristics, but with a hidden message.

That’s all steg86 does: it locates every semantic dual in its target program, maps its input message to a bitstring, and translates each dual depending on its corresponding message bit. Because it only uses duals, it’s completely agnostic to its containing format: the steg86 CLI currently supports ELF, Mach-O, and PE/PE32+ binaries.

Here’s what that actually looks like, with (some of the) differing bytes highlighted in red:

1
2
3
4
5
6
7
8
9
10
11
12
$ steg86 profile /bin/bash
Summary for /bin/bash:
  175828 total instructions
  27957 potential semantic pairs
  27925 bits of information capacity (3490 bytes, approx. 3KB)

$ steg86 embed /bin/bash test.steg <<< "hello world!"
$ file test.steg
test.steg: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a6cb40078351e05121d46daa768e271846d5cc54, for GNU/Linux 3.2.0, stripped

$ steg86 extract test.steg
hello world!

See the README for prior work and future refinements to steg86’s informational capacity.

Edit: An important note: because compilers and assemblers tend to select just one of the forms above, steg86 cannot provide deniability, even with strong encryption. Its transformations will always look abnormal compared to a compiler-produced binary.

Like every part of x86, it’s actually significantly complicated than that. But for our purposes, it’s essentially an 8-bit operand selector. ↩
On first pass. There are probably others that I’ve missed. ↩

ENOSUCHBLOG

Programming, philosophy, pedaling.