ENOSUCHBLOG

Programming, philosophy, pedaling.


Mini-post: Introduction to Pointers

Mar 9, 2015     Tags: programming    

This post is at least a year old.

During yet another recent IRC session:

1
2
3
4
5
6
<me>: hey <redacted>, what should i write my next post on?
<redacted>: got no idea. Maybe about pointers cause that always makes me feel like a dumb person
<me>: like an intro tutorial on them? i can do that
<redacted>: Yeah. :(
<redacted>: That'll be really helpful
<redacted>: :)

Luckily for all parties, a basic introduction to pointers (as they appear in C) should be relatively short.

What is a pointer?

On the most basic level, a pointer represents an address in memory.

A computer’s memory, as you may recall, is measured in bytes and is addressed from 0 to N - 1 where N is the number of total bytes available. Any number within this range is, therefore, a valid address of a memory cell that exists on the system at the very least. In reality things are nowhere near this simple*, but this model works well enough.

Pointers represent addresses to locations in memory, and not the contents of those locations themselves. The value of a pointer is simply its numeric address - we must access that numeric address to obtain the value of the actual data. This is better known as dereferencing, and its properties are central to understanding how and why languages like C represent pointers syntactically.

How do we represent pointers?

C represents pointers with the * character, commonly referred to as an asterisk, star, or splat. In C, the * actually has a dual purpose - it can either declare a pointer variable or dereference one, depending on the context.

Pointer declarations are a simple matter, and look very similar to normal variable declarations:

1
2
3
/* see the difference? */
int my_int;
int *my_pointer;

Of course, because C does not initialize variables by default, the value of my_pointer is random garbage, whatever data was previously on the stack at that address (the address of the pointer itself).

To initialize a pointer with the address of another variable, we use the & operator, better known as address-of:

1
2
/* get the address of the value stored in my_int and save it to my_pointer */
my_pointer = &my_int;

We can then access the value of my_int through my_pointer via the magic of dereferencing:

1
2
/* careful... */
printf("%d\n", *my_pointer);

You may not have noticed it, but the line above has a nasty bug in it. The syntax “*my_pointer” is correct and my_pointer itself was properly initialized, but we never initialized my_int despite the fact that we access it via my_pointer. As a result, dereferencing my_pointer succeeds but yields garbage.

All together now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>

int main(void) {
	/* properly initialized this time */
	int my_int = 100;
	int *my_pointer = &my_int;

	/* notice the different format specifier for pointers */
	printf("Value of my_int: %d\n", my_int);
	printf("Address of my_int: %p\n", &my_int);
	printf("Value of my_pointer: %p\n", my_pointer);
	printf("Value referenced by my_pointer: %d\n", *my_pointer);
	printf("Address of my_pointer: %p\n", &my_pointer);

	return 0;
}

And the results:

1
2
3
4
5
Value of my_int: 100
Address of my_int: 0x0724
Value of my_pointer: 0x0724
Value referenced by my_pointer: 100
Address of my_pointer: 0x0728

If a visualization helps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
          value      addr
     +------------+
     |            |
     |    ...     | 0x072C
     |            |
     +------------+
     | my_pointer |
 ----|   0x0724   | 0x0728
 |   |            |
 |   +------------+
 |   |   my_int   |
 |-->|    100     | 0x0724
     |            |
     +------------+

Hopefully these results make sense to you: my_int is at 0x0724 in memory, and has the value 100 stored at that address. Meanwhile my_pointer is at 0x0728 and has the value 0x0724 stored at that address. Because the value stored by my_pointer is actually the address of my_int, dereferencing my_pointer yields 100, or the value at the address of my_int.

What can we do with them?

Once we know the basic syntax of pointers in C, we can do all kinds of things.

For one, we can modify the data referenced by a pointer with a familiar syntax:

1
2
/* change the value at the address of my_int to 10 */
*my_pointer = 10;

We can also use pointers as an alternate syntax for arrays:

1
2
3
4
5
char str[1024] = "this is an example string";

/* *(str + N) is equivalent to str[N] */
str[0] = 'T'; /* => "This is an example string" */
*(str + 5) = 'I'; /* => "This Is an example string" */

We can even use them with memory management functions like malloc:

1
2
3
4
5
int *my_heap_int = malloc(sizeof(int));
*my_heap_int = 100;

/* don't forget to free any heap-allocated memory */
free(my_heap_int);

See man 3 malloc for more information on using malloc safely.

There are plenty of other applications: pointers to functions, multidimensional arrays, and pass-by-reference in functions are all examples of valid (and common) uses of pointers in C. Of course, if any of that sounds scary (and some of it should), forget about it. This is just a conceptual introduction to pointers, after all.

Summary

This post only scratches the surface of the complexities of pointers, and it probably doesn’t do a very good job at that. There are all kinds of catches, rules, and idiosyncrasies, any of which can whimsically crash your program or cause a horrible case of nasal demons.

There are plenty of better and more in-depth resources out there for the topic of C pointers, including just about every (well-regarded) textbook on C programming. If you really want to understand the operation and proper application of pointers in C, it is in your best interest to take lessons from such a resource and not from me.

That being said this post should have left you, at the very least, with a faint concept of memory addressing and the core distinction between the address of a memory block and the value within. Beyond that, I have no expectations.

Happy Hacking!

- William


Postnotes:

* Not all numeric values are valid addresses. On a 32-bit system with a 64-bit integral type, any values above (2**32) - 1 are impossible addresses without some kind of external intervention, regardless of physical memory size. Similarly, not all valid addresses can be accessed by all programs due to the use of virtual memory, ASLR, and various other hardware and OS-level protections. Finally, there exists the NULL pointer, which is commonly used to represent the lack of an actual address. Dereferencing a NULL pointer in C is a prime example of undefined behavior, but usually results in a segmentation fault or other fatal error.