Jun 13, 2020 Tags: programming, x86
Today I’m going to write up one small (and yet still remarkably complicated) fragment of x86_64’s instruction semantics: memory addressing.
Specifically, I’m going to write up the different ways in which x86_64 allows the user to address
memory via just one instruction: mov
.
I won’t attempt to cover other instructions that can touch memory (which is pretty much all of them,
thanks CISC), ones that write massive chunks of memory (looking at you, fxsave
), or any adjacent
subjects (code models, position independent code, binary relocations). I also won’t even try to
cover historical addressing modes or modes that work when an x86_64 processor isn’t in 64-bit mode
(i.e., any modes other than long mode with 64-bit code).
Despite (or perhaps thanks to?) the legacy hell that is x86_64’s instruction encoding, there are some constraints on how memory is addressed.
First, the good news:
Now, the bad news:
0x67
) in our encoding.I call this mode “Scale-Index-Base-Displacement” because I have no idea what else to call it.
As far as I can tell, neither Intel nor AMD actually considers this to be a singular mode; instead, they refer to it as a general collection of related modes with a wide variety of different encodings.
But we’re not talking about encodings today: we’re talking about semantics, and semantically each of these related modes falls back to some combination of four parameters:
rax
, rbx
, &c)1.Various combinations of the four (including all four) are valid. Here are the valid combinations, in roughly increasing order of complexity:
Displacement
Base
Base + Index
Base + Displacement
Base + Index + Displacement
Base + (Index * Scale)
(Index * Scale) + Displacement
Base + (Index * Scale) + Displacement
Let’s go through them one by one.
Displacement
This is arguably the simplest addressing mechanism in the x86 family: the displacement field is treated as an absolute memory address.
Unfortunately, it’s also almost completely useless on x86_64. Remember that note about displacements almost always being 32 bits? That means you can’t represent an absolute address, since an absolute x86_64 address is 64 bits (really 48, but whatever) and just won’t fit in the displacement.
There’s one exception to this: x86_64 allows for a 64-bit displacement with the a*
registers.
In Intel syntax:
1
2
3
4
5
6
7
8
; store the qword at 0x00000000000000ff into rax
mov rax, [0xff]
; store the dword at 0x00000000000000ff into eax
mov eax, [0xff]
; store the word at 0x00000000000000ff into ax
mov ax, [0xff]
; store the byte at 0x00000000000000ff into al
mov al, [0xff]
gas
(the GNU assembler) refers to these as movabs
in both 32-bit and 64-bit modes.
First of all, for code model reasons that aren’t relevant to this post. Eli Bendersky has a fantastic blog post on those.
More concretely: most programs have at least a few static addresses that are determined at compile-time, like global variables.
For example, this trivial program:
1
2
3
extern long var;
void f(long x) { var = x; }
…yields:
1
2
3
4
f:
mov rax, rdi
movabs QWORD PTR [var], rax
ret
(View it on Godbolt.)
Note: The above example was originally misleading; many thanks to haberman on HN for pointing out the error and offering a correct example.
Base
Addressing via the base register adds one layer of indirection over absolute addressing: instead of an absolute address encoded into the instruction’s displacement field, an address is loaded from the specified general-purpose register (any GPR! Hooray!).
This indirection allows us to do absolute addressing with an arbitrary destination register via the following pattern:
1
2
3
4
5
; store the immediate (not displacement) into rbx
mov rbx, 0xacabacabacabacab
; store the qword at the address stored in rbx into rcx
mov rcx, [rbx]
…but we have relatively few reasons to do that, given the richer addressing modes we’re about to see.
Because sometimes we have a calculated address already lying around from another operation, and we just want to use it.
The disassembly from the displacement sample above has a good example of this as well:
1
mov rax, qword ptr [rax]
Base + Index
This is just like addressing via the base register, except that we also add in the value of the index register.
For example:
1
2
3
; store the qword in rcx into the memory address computed
; as the sum of the values in rax and rbx
mov [rax + rbx], rcx
I had a hard time contriving an example for this, which of course means that my coworkers immediately found one:
1
2
3
int foo(char * buf, int index) {
return buf[index];
}
…which yields:
1
2
3
4
5
6
7
8
9
push rbp
mov rbp, rsp
mov qword ptr [rbp - 8], rdi
mov dword ptr [rbp - 12], esi
mov rax, qword ptr [rbp - 8] ; rax is buf
movsxd rcx, dword ptr [rbp - 12] ; rcx is index
movsx eax, byte ptr [rax + rcx] ; store buf[index] into eax
pop rbp
ret
(View it on Godbolt.)
This is obvious in retrospect: Base + Index
is perfect for modeling array
accesses where neither the array’s starting address nor the offset into the
array is fixed at compile-time.
Base + Displacement
More indirection! In case you haven’t guessed it, calculating the effective address with both the base register and the displacement field corresponds to two operations:
Then, we take that sum and use it as our effective address. By way of example:
1
2
3
; add 0xcafe to the value stored in rax
; then, store the qword at the computed address into rbx
mov rbx, [rax + 0xcafe]
As we’ve seen with Base + Index
, some addressing modes naturally reflect C-like
array semantics.
Base + Displacement
can be thought of in a similar manner, but for structure semantics:
the base register holds the address to the beginning of the structure, and the displacement
field holds the fixed offset into that structure.
For example, the following:
1
2
3
4
5
6
7
8
struct foo {
long a;
long b;
};
long bar(struct foo *foobar) {
return foobar->b;
}
assembles as:
1
2
3
4
5
6
7
push rbp
mov rbp, rsp
mov qword ptr [rbp - 8], rdi
mov rax, qword ptr [rbp - 8] ; rax is foobar
mov rax, qword ptr [rax + 8] ; rax + 8 is foobar->b; store back into rax
pop rbp
ret
(View it on Godbolt.)
This also makes sense if you think about the stack construction and layout at the beginning of
every function as a custom structure: accesses like [rbp - N]
are basically stack->objN
.
Base + Index + Displacement
If the last mode makes sense to you, then this one is the logical next step: it’s semantically identical, except that we also add the value of the index register.
Just as above, but with one more register:
1
2
3
; add 0xcafe to the values stores in rax and rcx
; then, store the qword at the computer address into rbx
mov rbx, [rax + rcx + 0xcafe]
Just as Base + Index
naturally models an array access and Base + Displacement
naturally models
structure access, Base + Index + Displacement
naturally models structure access within an array!
I had a hard time getting clang
to emit one of these on Godbolt, but eventually
got one with -O1
:
1
2
3
4
5
6
7
8
9
struct foo {
long a;
long b;
};
long square(struct foo foos[], long i) {
struct foo x = foos[i];
return x.b;
}
assembles to the very terse:
1
2
3
shl rsi, 4
mov rax, qword ptr [rdi + rsi + 8] ; rdi is foos, rsi is i, 8 is the field offset
ret
(View it on Godbolt.)
Base + (Index * Scale)
Our first multiplication!
The scale field is like displacement in that it’s a constant factor that’s encoded into
our instruction. Unlike displacement, however, scale is extremely constrained: it’s only
two bits wide, meaning that it can only be 1 of 4 possible values: 1
, 2
, 4
, or 8
.
As the name implies, the scale field is used to scale (i.e., multiply) another field. In particular, it always scales the index register — scale cannot be used without index.
Among many other things, Base + (Index * Scale)
naturally models accesses into an array
of pointers (distinct from an array of laid-out structures, like above!):
1
2
3
4
5
6
7
8
9
struct foo {
long a;
long b;
};
long bar(struct foo *foos[], long i) {
struct foo *x = foos[i];
return x->b;
}
assembles to:
1
2
3
mov rax, qword ptr [rdi + 8*rsi] ; rdi is foos, rsi is i, 8 is the scale (pointer-sized!)
mov rax, qword ptr [rax + 8]
ret
(View it on Godbolt.)
(Index * Scale) + Displacement
Let’s keep going. This is almost identical to the last mode, except that we’ve swapped the base register out for the displacement field. No particular complexity there.
(Index * Scale) + Displacement
naturally models a specialized case of array access:
when the array is statically addressable (e.g., a global) and the element size is
computable via the scale.
For example:
1
2
3
4
5
int tbl[10];
int foo(int i) {
return tbl[i];
}
assembles to:
1
2
3
movsxd rax, edi
mov eax, dword ptr [4*rax + tbl] ; rax is i, 4 is the scale (sizeof(int) == 4)
ret
(View it on Godbolt.)
Base + (Index * Scale) + Displacement
Now we’re cooking with gas. This is the final and most complex x86_64 addressing form, but there’s absolutely nothing conceptually special about it: it’s just one more arithmetic operation on top of the three-parameter addressing modes.
Base + (Index * Scale) + Displacement
naturally models a two-dimensional array access:
1
2
3
4
5
long tbl[10][10];
long foo(long i, long j) {
return tbl[i][j];
}
assembles to:
1
2
3
4
lea rax, [rdi + 4*rdi]
shl rax, 4
mov rax, qword ptr [rax + 8*rsi + tbl]
ret
(View it on Godbolt.)
The addressing mode documented above is almost identical to its historical x86_32 equivalent — its biggest changes are allowing 64-bit GPRs and (sometimes) 64-bit displacements.
Where x86_64 really diverges is in its addition of a brand new addressing mode, best known as “RIP-relative” addressing.
Why is it called “RIP-relative”? Because it encodes a displacement relative to the RIP register’s
value (specifically the RIP of the next instruction, not the current one). This is usually
represented with the familiar [Base + Displacement]
syntax, except that the base register is
now rip
instead of a GPR:
1
mov rax, [rip + 16]
For reasons that I originally said that I wouldn’t go into in this blog post: position-independent code and code models.
We’ll make a brief exception: using RIP-relative addressing makes position-independent code smaller and simpler, and is a natural fit for the “small” (and default) code model, where all code and data needs to be addressable within a 32-bit offset.
For example, the following when compiled with -O1
and -fpic
:
1
2
3
4
5
long tbl[10];
int foo(int i) {
return tbl[i];
}
requires just two mov
s on x86_64:
1
2
3
4
foo:
mov rax, qword ptr [rip + tbl@GOTPCREL]
mov rax, qword ptr [rax + 8*rdi]
ret
…but three and some additional boilerplate on x86_32:
1
2
3
4
5
6
7
8
9
10
foo:
call .L0$pb
.L0$pb:
pop eax
.Ltmp0:
add eax, offset _GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb)
mov ecx, dword ptr [esp + 4]
mov eax, dword ptr [eax + tbl@GOT]
mov eax, dword ptr [eax + 4*ecx]
ret
x86_64 almost killed segmentation. Almost. Segment registers are no longer necessary thanks to the flat address space, but they still show up in a few places:
Linux (really glibc) uses fs
in userspace to access the TLS segments configured by the kernel.
You can find these
segments specified in the per-CPU GDT configuration.
gs
appears free for use in userspace , assuming something else in glibc (or whatever libc you use)
doesn’t use it.
Linux uses gs
in kernelspace to store the base address for the per-CPU variable region.
We can see this in the macro definition of PER_CPU_VAR
:
1
#define PER_CPU_VAR(var) %__percpu_seg:var
which, on x86_64, expands to:
1
%gs:var
So, unfortunately, we still need to care about these. The good news is that caring about them isn’t too bad: they essentially boil down to adding the value in the segment register2 to the rest of the address calculation.
By way of example with a thread-local variable:
1
2
3
4
5
6
int __thread x = 0;
int foo(void) {
int *y = &x;
return *y;
}
assembles to:
1
2
3
4
5
6
7
8
9
push rbp
mov rbp, rsp
mov rax, qword ptr fs:[0] ; grab the base address of the thread-local storage area
lea rax, [rax + x@TPOFF] ; calculate the effective address of x within the TLS
mov qword ptr [rbp - 8], rax ; store the address of x into y
mov rax, qword ptr [rbp - 8]
mov eax, dword ptr [rax]
pop rbp
ret
(View it on Godbolt.)
Our very first gotcha: this is true when using 64-bit registers for addressing, but not when using 32-bit registers. When addressing with 32-bit registers we can use any 32-bit GPR as an index except esp
, thanks to an encoding quirk (the bit pattern that would indicate esp
(0b100
) is instead used to indicate…something). ↩
Not actually, as pointed out by haberman: in 32-bit modes the segment register’s value corresponds to a GDT offset, while in 64-bit modes the value is unused and is replaced with the FS.base
and GS.base
MSRs. The SWAPGS page on on the OSDev Wiki has the details. ↩