main resources links contact

Recitation 11: Memory Layout

Material from Recitation:

Disclaimer: I may drift into more detail than you need for 15-122.

In the code that we've worked with so far, one thing we really haven't needed to think about is where our data is actually stored. When we declare, for example, an int, we obviously needed to think about how that data is being stored. Similarly, when dealing with arrays, we haven't had to think much about what happens when we magically call alloc_array. Now, we'll begin to take a closer look at what is actually at work here.

First of all, every part of your program lives in memory (even the code itself, but we're not going to worry about that). You're probably familiar, on some level, what memory means in the context of a computer. This is also one of the fundamental differences between a 32 bit and 64 bit operating system--how much memory is available. If you've tried using alloc or alloc_array in coin, you may have seen something like this:

  acappiel@unix3:~$ coin
  Coin 0.3.1 'Nickel' (r108, Wed Aug 29 08:36:24 EDT 2012)
  Type `#help' for help or `#quit' to exit.
  --> int *x = alloc(int);
  x is 0xFD8DDFF0 (int*)

This value represents an address. A program's memory is addressed in 1 byte increments--this means that we can't allocate something smaller than one byte. In this particular case, we allocate 4 bytes for an int. These 4 bytes begin at 0xFD8DDFF0. We usually think of memory as a very long 1-dimensional array of data. We generally draw this vertically, with low addresses at the bottom and large addresses on top.

This paragraph will be a bit of a tangent, but hopefully interesting. On a 32 bit operating system, a pointer is a 4 byte value (or 32 bits!). On a 64 bit operating system, this is an 8 byte value. This is significant because with 32 bits, we have about 4 gigabytes of memory to address. While this often won't matter, it reasonable in 2012 to need that much memory. With 64 bits, we effectively square (not double) the amount of memory we have. This is 16 exbibytes (about $18 \times 10^{18}$ bytes), which is pretty much unfathomably large by current standards.

A program's memory space falls into two major categories: the stack and the heap. Both of these play very different roles during the execution of the program.

The stack begins at the top of memory (high addresses). The stack is a very well-organized structure and is based on function calls. We can think of function calls as frames on the stack. When we make a function call, we make a new stack frame that holds all of the local variables (that's not all, but that's what you need to know). If we make a call inside of this function, we get a new stack frame directly below this one. The stack begins at large addresses and grows down. So, in order to do this, we need to know exactly how big the stack frame needs to be at the beginning. This also allows the compiler to keep track of where each local variable is in memory at compile time--this why we don't have to worry about where our local variables are in memory. So, data like arrays, which can have whatever size we want them to at runtime, can't be stored on the stack. So, we can only store "small" types on the stack. For example, int, bool, int*, int[], struct list_node*, etc. Note that int[] holds a reference to an array, not the array itself. It's also worth noting that when we return from a function, the stack frame is effectively deallocated, so none of that memory is available anymore.

The heap is at the bottom of memory (low addresses). The heap is rather chaotic and disorganized, in contrast to the stack. We allocate memory in the heap every time we make a call to alloc or alloc_array. Consequently, the heap ends up becoming so disorganized because at compile time, there's not way to say how big allocations are or how long we will need them. Since there's no way to determine the heap organization in advance, this is why we explicitely need to keep pointers to data we allocate. Another big advantage is that heap memory is independent of the stack frames mentioned earlier. So, we can freely pass pointers into functions as arguments and return them from functions. The stack and the heap work together in our programs to allow us to worry very little about memory management as the author of C0 programs.

This is just the beginning of a very complex and interesting topic. Generally, 15-122 does not strongly emphasize memory management. However, this information may be helpful for understanding the difference between pointers and primitive types. Some of these ideas will be revisited during our transition from C0 to C.

If you have any comments or see any errors, please let me know.