main resources links contact

Recitation 09: Intro to Data Structures

Material from Recitation:

POINTERS

Today, we introduce some basics of pointers. Pointers will come up over and over again, so getting a good understanding from the very beginning will be important. On one hand, pointers are like your best friend in C--they allow you to do some really cool stuff. On the other hand, they'll be a frequent source of headaches for you.

In the most general sense, for any data type t, we can have a type *t, which means a pointer to something of type t. All data lives somewhere in memory. Every location in memory has an address (we'll discuss this in more detail in the near future). So far, the data we've been working with (except arrays) is simple enough that the compiler can take care the memory, so you don't explicitly need to allocate anything.

On the other hand, memory for pointers need to be explicitly allocated. We've had a small preview of this working with arrays. When dealing with arrays, we have to use this special function, alloc_array to give memory to the array. An array is very similar to the pointer types we'll introduce here, but the C0 implementation hides this a bit. Similarly, there is a function alloc that allocates memory for other types.

In lecture, we talked about pointers in the context of structs, but let's put that off for a moment. I mentioned earlier that we can have a pointer to any type. So, I can have a pointer to an integer. I can allocate a pointer to an int and set the value to 5 as follows:
int *x = alloc(int);
*x = 5;

The type declaration int *x is a pointer to an int. At this point, the variable x holds a memory address. This memory address is where the int data is stored. So, how do we work with the data at this address? I can't say x = 5, since x is the address, not the data. In fact, in C0, we can't actually look at what address is stored in x anyway. To access the data, I need to do what's known as dereferencing the pointer. This is what is done with the * in *x = 5. This may be a little confusing, given how we've declared the variable as int *x. A helpful way to look at this type declaration is: "when I dereference x, I get an int."

There's also this special value for a pointer called NULL. This value means that we're referring to invalid memory. In practice, this means that we have a pointer that we haven't allocated any memory for yet. In C0, attempting to derefernce a NULL pointer results in a segmentation fault and the program will terminate. This is of course undesirable. We will see shortly that we can also use the NULL pointer to our advantage.

Structs

So far, we've only dealt with primitive types (int, bool, ...) and arrays. We can also create our own types, known as structs. Structs are a way of packaging multiple values together under one variable name. This is probably best illustrated with an example. Suppose we want to store some information about a person. We could create a struct such as:
struct person_data {
  string name;
  int height; // in kg.
  int weight; // in m.
};
typedef struct person_data person;

The syntax for this looks pretty similar to what we've seen so far. We name the struct and put all of the fields in between braces. Each field is declared as a type with a name. Note that the ending brace has a semi-colon afterwards. Also, I can't assign default values to fields, e.x. int height = 0; would not be valid. We've seen the typedef notation before too. In this case, we simply use it for convenience, giving us more succinct type names.

Now, suppose I want to work with a struct like this. Here is an example:

  person *p = alloc(person);
  p->name = "Bob";
  p->height = 68;
  p->weight = 200;
  println(p->name);

First, we have to allocate memory for the struct. We use this struct->field notation to access the various elements of the struct. This is the same for when we assign values as when we access values. The -> performs two jobs: it navigates to the correct data within the struct and dereferences the pointer.

Now, we can work with a struct just like any other datatype. Suppose we wanted to write a function that computes a person's BMI. The function might look like this:

  int bmi (person p) {
    int square_height = p->height * p->height;
    return p->weight / square_height;
  }

It is also often convenient to create a helper function to initialize the new struct in the first place. For example:

  person *person_new (string name, int height, int weight) {
    person *p = alloc(person);
    p->name = name;
    p->height = height;
    p->weight = weight;
    return p;
  }

Linked Lists

As an alternative to the arrays we've been using so far, we introduce a new way of storing data called a linked list. For this purpose, we propose the following struct (where we're storing ints):
struct list_node {
  int data;
  struct list_node *next;
}
typedef struct list_node list;

What may jump out is that we have an instance of the struct inside of itself. Well, this isn't exactly true. We have a pointer to a struct inside the struct. This is much different than a struct inside a struct. The latter case would effectively infinitely place structs within one another, which of course isn't possible. However, we can easily place a pointer to a struct inside of a struct because it initially isn't initialized to anything. This is what's known as a recursive data type. If I wanted to create a linked list that is the equivalent of [1, 2], I could do the following:

  list *first = alloc(list);
  first->data = 1;
  list *second = alloc(list);
  second->data = 2;
  first->next = second;

I mentioned earlier that the NULL pointer can actually be useful sometimes. Here is one such case. Suppose I want to iterate through a linked list, doing some operation, until I hit the end. Well, where is the end? The last element is when l->next == NULL. This is very helpful when writing functions that operate on linked lists. We will often use something like:
while (l->next != NULL) {
  // Do something.
  l = l->next;
}

So, let's compare linked lists to arrays. Upon first glance, this definitely looks like a lot more effort than using an array, so are there any advantages? You've probably been dismayed working with arrays that we have to declare the size when we allocate the array. I'm sure you can think of many situations where this would be inconvenient. If we did need to resize the array, we would have to create a new one and copy every element over, which is, of course, an $O(n)$ operation. On the other hand, when we have a linked list, we start with nothing and simply insert new elements as we go. These are each constant time operations.

So, are linked lists the magic solution to all of our problems? Let's consider another possible operation. Suppose I have a list of length 10 both as an array and as a linked list. Let's say I want to access one of the elements. In an array, I know exactly where that element is. A quick calculation is done and I get the data back in $O(1)$. On the other hand, when dealing with a linked list, I generally only keep a pointer to the first node (or sometimes first and last). So, in order to get to a particular index, I need to traverse the list until I reach it, which is $O(n)$. So, this solved one problem, but introduced another one.

As we study many different data structures over the course of the semester, we will see that there is no one perfect data structure. Each one has its own pros and cons, making each useful in certain situations. We will see the linked list implementation in more detail when we discuss queues and stacks.

Exercises: try to write the following functions that correspond to common operations on linked lists. Reason over their runtimes and how these compare to arrays. Assume we keep pointers to the front and back of the list.

  1. void insert_at_back(list L, int data);
  2. void insert_at_front(list L, int data);
  3. void insert_at_index(list L, int data, int index);
  4. int get_data(list L, int index);
  5. int set_data(list L, int index);
  6. int length(list L);
  7. int find_index(list L, int data);

Queues

We began discussing queues as linked lists today. Since we did not finish (and there is already a lot on this page), the notes will appear on the page for the next recitation.

If you have any comments or see any errors, please let me know.