As an alternative to the arrays we've been using so far, we introduce a
new way of storing data called a linked list. For this purpose, we propose
the following struct (where we're storing ints):
struct list_node {
int data;
struct list_node *next;
}
typedef struct list_node list;
What may jump out is that we have an instance of the struct inside of
itself. Well, this isn't exactly true. We have a pointer to a struct inside
the struct. This is much different than a struct inside a struct. The latter
case would effectively infinitely place structs within one another, which of
course isn't possible. However, we can easily place a pointer to a struct
inside of a struct because it initially isn't initialized to anything. This
is what's known as a recursive data type. If I wanted to create a linked
list that is the equivalent of [1, 2], I could do the following:
list *first = alloc(list); first->data = 1; list *second = alloc(list); second->data = 2; first->next = second;
So, let's compare linked lists to arrays. Upon first glance, this definitely looks like a lot more effort than using an array, so are there any advantages? You've probably been dismayed working with arrays that we have to declare the size when we allocate the array. I'm sure you can think of many situations where this would be inconvenient. If we did need to resize the array, we would have to create a new one and copy every element over, which is, of course, an $O(n)$ operation. On the other hand, when we have a linked list, we start with nothing and simply insert new elements as we go. These are each constant time operations.
So, are linked lists the magic solution to all of our problems? Let's consider another possible operation. Suppose I have a list of length 10 both as an array and as a linked list. Let's say I want to access one of the elements. In an array, I know exactly where that element is. A quick calculation is done and I get the data back in $O(1)$. On the other hand, when dealing with a linked list, I generally only keep a pointer to the first node (or sometimes first and last). So, in order to get to a particular index, I need to traverse the list until I reach it, which is $O(n)$. So, this solved one problem, but introduced another one.
As we study many different data structures over the course of the semester, we will see that there is no one perfect data structure. Each one has its own pros and cons, making each useful in certain situations. We will see the linked list implementation in more detail when we discuss queues and stacks.
Exercises: try to write the following functions that correspond to common operations on linked lists. Reason over their runtimes and how these compare to arrays. Assume we keep pointers to the front and back of the list.
Today, we introduce some basics of pointers. Pointers will come up over and over again, so getting a good understanding from the very beginning will be important. On one hand, pointers are like your best friend in C--they allow you to do some really cool stuff. On the other hand, they'll be a frequent source of headaches for you.
In the most general sense, for any data type t, we can have a type *t, which means a pointer to something of type t. All data lives somewhere in memory. Every location in memory has an address (we'll discuss this in more detail in the near future). So far, the data we've been working with (except arrays) is simple enough that the compiler can take care the memory, so you don't explicitly need to allocate anything.
On the other hand, memory for pointers need to be explicitly allocated. We've had a small preview of this working with arrays. When dealing with arrays, we have to use this special function, alloc_array to give memory to the array. An array is very similar to the pointer types we'll introduce here, but the C0 implementation hides this a bit. Similarly, there is a function alloc that allocates memory for other types.
In lecture, we talked about pointers in the context of structs, but let's
put that off for a moment. I mentioned earlier that we can have a pointer to
any type. So, I can have a pointer to an integer. I can allocate a pointer
to an int and set the value to 5 as follows:
int *x = alloc(int);
*x = 5;
The type declaration int *x is a pointer to an int. At
this point, the variable x holds a memory address. This memory
address is where the int data is stored. So, how do we work with the data at
this address? I can't say x = 5, since x is the address,
not the data. In fact, in C0, we can't actually look at what address is
stored in x anyway. To access the data, I need to do what's known
as dereferencing the pointer. This is what is done with the * in
*x = 5. This may be a little confusing, given how we've declared
the variable as int *x. A helpful way to look at this type
declaration is: "when I dereference x, I get an int."
There's also this special value for a pointer called NULL. This value means that we're referring to invalid memory. In practice, this means that we have a pointer that we haven't allocated any memory for yet. In C0, attempting to derefernce a NULL pointer results in a segmentation fault and the program will terminate. This is of course undesirable. We will see shortly that we can also use the NULL pointer to our advantage.
We will explore queues as a specific application for linked lists. To describe queues, we can make an analogy to standing in line at the store. When you're ready to pay, you get in the back of the line. As much as we might like to, you can't push everyone else out of the way, so you have to stay last in line. If someone comes after you, then they have to go behind you. On the other hand, after the current shopper finishes paying, the first person in line goes next. Eventually, it's your turn. At this point, you are at the front of the line. Equivalently, you can think of a queue as pieces of data waiting in line for your program to evaluate them. When we want to add a new element, or enqueue it, we add it to the back. When we want to pick a new element to evaluate, we remove it, or dequeue it from the front. This is refered to as a first in first out (FIFO) data structure.
So, how do we implement this as a linked list? We can pretty much use a linked list in its most basic form with a few more additions. For one, we want a type we can call a queue:
struct queue_header { list *front; list *back; } typedef struct queue_header *queue;
We'll discuss why we chose this struct soon. Note that usually we don't put the '*' in the typedef. We do this to remind ourselves that we're working with pointers. However, in this case, we want to implement an abstract type that reveals as little about its implementation as possible.
So, why keep these two pointers? Intuitively, we can see that the intent is for these to point at the first and last thing in the linked list. We also choose to end our linked list with a dummy node, so back is always pointing to that dummy node. Recall that the two main operations we want to perform on linked lists are enqueue and dequeue. On a fairly high level, how do we want to go about implementing these? For dequeue, we can just remove the node pointed to by front and then update front so that it points to the old front->next. Similarly for enqueue, we fill the data field of the dummy node, pointed to by back, allocate a new dummy node, and update the pointers for the old back->next and back to the new dummy node. Be careful about the order you update pointers, since it can drastically affect what happens.
So, now we've seen that we can implement queues as linked lists, but is this a good choice? Let's think back to the two operations we've just described. In both cases, we can do the required work without traversing any amount of the linked list, meaning that both operations are $O(1)$. So, that sounds pretty encouraging. Furthermore, we can just keep adding nodes endlessly, so we don't have to worry about our queue growing arbitrarily large. For these reasons, it turns out that a linked list is a good choice of a data structure for a queue.
You should go back and take a look at the implementation to review why the functions work according to their specification.
Fundamentally, stacks are very similar to queues. We will also implement stacks as a linked list. This time, we will refer to the two ends of the linked list as the "top" and "bottom." Similar to queues, we define two primary operations on stacks. To remove elements from the stack, we pop them off of the top. However, when we want to add new elements, they also get added to the top. This is known as a push operation. Clearly, the flow of data in and out of the stack is changed significantly. Stacks are a last in, first out (LIFO) data structure. With similar motivations, we propose the following struct wrapper for stacks:
struct stack_header { list *top; list *bottom; } typedef struct queue_header *queue;
The push and pop functions are very similar to enqueue and dequeue, except that both operate on the top of the stack. So, you may argue that there's no reason to keep track of the bottom of the stack in the first place. If you're thinking this, you have a perfectly valid point--we could leave the bottom pointer out and still have a great stack implementation. However, we've included it for a few reasons:
Again, you should observe that all of the major functions that operate on stacks run in $O(1)$.
If you have any comments or see any errors, please let me know.