Mar 9, 2015 Tags: programming
During yet another recent IRC session:
1
2
3
4
5
6
<me>: hey <redacted>, what should i write my next post on?
<redacted>: got no idea. Maybe about pointers cause that always makes me feel like a dumb person
<me>: like an intro tutorial on them? i can do that
<redacted>: Yeah. :(
<redacted>: That'll be really helpful
<redacted>: :)
Luckily for all parties, a basic introduction to pointers (as they appear in C) should be relatively short.
On the most basic level, a pointer represents an address in memory.
A computer’s memory, as you may recall, is measured in bytes and is addressed
from 0
to N - 1
where N
is the number of total bytes available. Any number
within this range is, therefore, a valid address of a memory cell that exists on
the system at the very least. In reality things are nowhere near this simple*,
but this model works well enough.
Pointers represent addresses to locations in memory, and not the contents of those locations themselves. The value of a pointer is simply its numeric address - we must access that numeric address to obtain the value of the actual data. This is better known as dereferencing, and its properties are central to understanding how and why languages like C represent pointers syntactically.
C represents pointers with the *
character, commonly referred to as an
asterisk, star, or splat. In C, the *
actually has a dual purpose - it can
either declare a pointer variable or dereference one, depending on the context.
Pointer declarations are a simple matter, and look very similar to normal variable declarations:
1
2
3
/* see the difference? */
int my_int;
int *my_pointer;
Of course, because C does not initialize variables by default, the value of
my_pointer
is random garbage, whatever data was previously on the stack at that
address (the address of the pointer itself).
To initialize a pointer with the address of another variable, we use the &
operator, better known as address-of:
1
2
/* get the address of the value stored in my_int and save it to my_pointer */
my_pointer = &my_int;
We can then access the value of my_int
through my_pointer
via the
magic of dereferencing:
1
2
/* careful... */
printf("%d\n", *my_pointer);
You may not have noticed it, but the line above has a nasty bug in it. The
syntax “*my_pointer
” is correct and my_pointer
itself was properly initialized,
but we never initialized my_int
despite the fact that we access it via
my_pointer
. As a result, dereferencing my_pointer
succeeds but yields
garbage.
All together now:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>
int main(void) {
/* properly initialized this time */
int my_int = 100;
int *my_pointer = &my_int;
/* notice the different format specifier for pointers */
printf("Value of my_int: %d\n", my_int);
printf("Address of my_int: %p\n", &my_int);
printf("Value of my_pointer: %p\n", my_pointer);
printf("Value referenced by my_pointer: %d\n", *my_pointer);
printf("Address of my_pointer: %p\n", &my_pointer);
return 0;
}
And the results:
1
2
3
4
5
Value of my_int: 100
Address of my_int: 0x0724
Value of my_pointer: 0x0724
Value referenced by my_pointer: 100
Address of my_pointer: 0x0728
If a visualization helps:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
value addr
+------------+
| |
| ... | 0x072C
| |
+------------+
| my_pointer |
----| 0x0724 | 0x0728
| | |
| +------------+
| | my_int |
|-->| 100 | 0x0724
| |
+------------+
Hopefully these results make sense to you: my_int
is at 0x0724
in
memory, and has the value 100
stored at that address. Meanwhile my_pointer
is at 0x0728
and has the value 0x0724
stored at that address.
Because the value stored by my_pointer
is actually the address of my_int
,
dereferencing my_pointer
yields 100, or the value at the address of my_int
.
Once we know the basic syntax of pointers in C, we can do all kinds of things.
For one, we can modify the data referenced by a pointer with a familiar syntax:
1
2
/* change the value at the address of my_int to 10 */
*my_pointer = 10;
We can also use pointers as an alternate syntax for arrays:
1
2
3
4
5
char str[1024] = "this is an example string";
/* *(str + N) is equivalent to str[N] */
str[0] = 'T'; /* => "This is an example string" */
*(str + 5) = 'I'; /* => "This Is an example string" */
We can even use them with memory management functions like malloc
:
1
2
3
4
5
int *my_heap_int = malloc(sizeof(int));
*my_heap_int = 100;
/* don't forget to free any heap-allocated memory */
free(my_heap_int);
See man 3 malloc
for more information on using malloc
safely.
There are plenty of other applications: pointers to functions, multidimensional arrays, and pass-by-reference in functions are all examples of valid (and common) uses of pointers in C. Of course, if any of that sounds scary (and some of it should), forget about it. This is just a conceptual introduction to pointers, after all.
This post only scratches the surface of the complexities of pointers, and it probably doesn’t do a very good job at that. There are all kinds of catches, rules, and idiosyncrasies, any of which can whimsically crash your program or cause a horrible case of nasal demons.
There are plenty of better and more in-depth resources out there for the topic of C pointers, including just about every (well-regarded) textbook on C programming. If you really want to understand the operation and proper application of pointers in C, it is in your best interest to take lessons from such a resource and not from me.
That being said this post should have left you, at the very least, with a faint concept of memory addressing and the core distinction between the address of a memory block and the value within. Beyond that, I have no expectations.
Happy Hacking!
- William
Postnotes:
* Not all numeric values are valid addresses. On a 32-bit system with a 64-bit
integral type, any values above (2**32) - 1
are impossible addresses without
some kind of external intervention, regardless of physical memory size.
Similarly, not all valid addresses can be accessed by all programs due to the
use of virtual memory,
ASLR,
and various other hardware and OS-level protections. Finally, there exists the
NULL
pointer, which is commonly used to represent the lack of an actual
address. Dereferencing a NULL
pointer in C is a prime example of undefined
behavior, but usually results in a segmentation fault or other fatal error.