Understanding Recursion Using Linked Lists

Note: I'll be using Go for any implementation or code that you see, unless otherwise specified - with any printing / output lines omitted for brevity.

Extra note: As with most of my articles, these are to solidify my own mental models and I am by no means an expert. If you notice any errata, please let me know!

As you may have read in my Capstone reflection, this past week I've been studying multiple concepts surrounding data structures and algorithms. Two of those have been the data structure for linked lists, and generally the concept of recursion (which can be used for multiple types of algorithms, but commonly divide and conquer algorithms and dynamic programming).

Recursion has been something I've struggled with understanding ever since I first covered it in my high school programming class. So to solidify my mental model for it, I'll be walking through a solution to a problem in this article that highlights different aspects about recursion.

For those of you already familiar with both of these, you can skip to the Problem section to continue. Others, feel free to continue reading if you need to be brought up to speed on the basics of recursion or linked lists.

Linked Lists

Linked lists are, as the name may suggest, a collection of items. Where they differ from normal collections like an array is that they do not take up a contiguous space in memory. If you have an array of numbers like so, [1, 2, 3, 4], most languages will reserve a portion of memory to store the entire array.

This becomes much clearer when working with a strongly typed language, like Go. Since integers take up a certain amount of space, the compiler will know how much maximum memory it needs to reserve for an integer array of size 4, and then reserve that range of memory addresses for the array. To show this, here's some code:

func main() {
    var nums [4]int
    nums[1] = 5
}

Output:

Nums: [0 0 0 0]
nums[0]: 0xc00007e000
nums[3]: 0xc00007e018
Overwriting by index...
Nums: [0 5 0 0]
nums[0]: 0xc00007e000
nums[3]: 0xc00007e018

As you can see, the array is initially storing all zeros (the equivalent of null / nil / undefined depending in your language of choice for int types in Go). When we re-assign the value stored in nums[1], then look at the addresses again, we see that it's the same section of memory. We can tell because of the difference between the pointer for nums[3] and nums[0]. If you subtract the first hex-value from the last, you get 32 (in base 10). This is because the generic int type in Go will default to an 8-bit integer - and to store four 8-bit integers you need 32 bits. This allows for a memory-efficient solution for storing a collection of values; however, it becomes inefficient when you have to constantly make changes to the collection.

Say you wanted to change it to [1, 3, 4]. You could simply re-assign nums[1] to the next value. But then you would have to cascade that operation for the rest of the array, iterating over any remaining values you're keeping. This would then become an O(n) operation as the theoretical array length approaches infinity. For clarity, here's what you would need to implement:


func main() {
    var nums [4]int

    nums = [4]int{1, 2, 3, 4}
    delIdx := 1

    for i := delIdx; i < len(nums)-1; i++ {
        nums[i] = nums[i+1]
    }
}

This would end up mutating nums to be [1, 3, 4, 4]. But now we have an extra problem: the array is still of size 4, with whatever the last value originally was. Since we can't force the nums variable to just "re-size" itself, we have to figure something out. Since nums can be thought of as a window of addresses, we'd need to re-size that window.

We can utilize Go's slice index notation to easily do this: newNums := nums[:3]. This example also shows us that a slice is just an abstraction of Go's array implementation - if you compare the pointers &nums[0] and &newNums[0], you'll see they are the same. But if you try to access newNums[3], you'll get an invalid index error. This is because the data type for newNums under the slice abstraction is an array of size 3. The pointer for nums[3] is now valid for garbage collection and is freed to be used for something else.

So, how do linked lists help us solve this problem? Well, like I said, they don't take up a contiguous portion of memory. The data structure contains not only the value for that node, but also the value for the next node's pointer. In this instance, this is a singly-linked list. Each node is only aware of itself and the next node. Implementing this in Go would look something like this:

type LinkedList struct {
    Val int
    Next *LinkedList
}

So let's set up our values inside of a linked list instead.

nestedList := &LinkedList{
    Val: 1,
    Next: &LinkedList{
        Val: 2,
        Next: &LinkedList{
            Val: 3,
            Next: &LinkedList{
                Val: 4,
            },
        },
    },
}

i.e. 1 -> 2 -> 3 -> 4 -> nil

Now, in the specific use case of deleting an Nth-node from a linked list, the time complexity will still be O(n). I chose this example simply to demonstrate the concept (and it is the foundation of the problem discussed later). All we have to do is find the second node (since linked lists are not indexed like arrays are - in fact they're not indexed at all, which is why accessing a value is slower for a linked list and faster for an array), and overwrite it's pointer. Overwriting the pointer directly also allows us to not need to keep track of extra variables for a previous and next node. While the function below is called deleteNthNode, the n being passed in is the same value as the one I used for the array above to reflect the differences between an indexed array and a linked list which has no indexing properties. It really should be called deleteNthPlusOneNodeForDemonstrationPurposes.

func deleteNthNode(head *LinkedList, n int) *LinkedList {
    counter := 1
    temp := head
    for counter <= n {
        temp = temp.Next
        counter++
    }

    *temp = *temp.Next

    return head
}

gives us 1 -> 3 -> 4 -> nil

Alright. I ended up going a bit deeper than I initially intended, but hey, if you knew absolutely nothing about linked lists at least you can follow along easier now!

Recursion

Recursion is, in some ways, more straight-forward to understand, and in others, more complex. At its core, a recursive function is a function that invokes itself - pretty simple, yeah?

The reason I'm pairing recursion and linked lists together is that linked lists are helpful in demonstrating a recursive function's awareness of its arguments and return values at certain levels.

Let's take a look at a simple recursive function - one that I actually use quite often to pretty-print a linked list.

func prettyPrintList(head *LinkedList) {
    if head != nil {
        fmt.Print(head.Val, " -> ")
        prettyPrintList(head.Next)
    } else {
        fmt.Println("nil")
    }
}

This is a great example that exemplifies why the two pair well together - you can't just keep spitting out values of the nodes since the current node only knows of the subsequent one. To get more, you'd need a for loop to keep re-assigning a dummy head, then have the same nil guard clauses, and so on.

With this recursive function, it's given a linked list. Since the function at the nth-level only has access to the nth-node, we can keep calling the function as long as we haven't reached nil, passing in the next node further down the chain.

However, the quirk with recursive functions is that if you're actually using the return value, they don't just go down to the level they need to reach. Depending on how you use the function, it also bubbles back up with the function's return value. If you're familiar with the DOM, you can think of it in the same way as event propagation.

Once the function is invoked at, say the 3rd level, it then goes one level deeper, the 4th level. If there's something at the 4th level that matches the base case, or the situation in which we don't want to call the function again, it will then return the desired value from the 4th level. Now, the 3rd level has the 4th level return value. Once the 3rd level returns, it propagates the value up to the 2nd level. And this repeats until we eventually surface back to the top-level and have our final return.

Problem

The problem we'll be solving is from LeetCode, if you'd like to try your hand at solving it before you continue.

We know we'll be given a linked list of integers that are sorted in ascending order, and we're tasked with removing nodes whose value appear more than once. Not just creating a list of unique values, but fully removing any node with that value.

We also know that when traversing each node recursively, we'll only have access to the current node and the next node. This is an indicator that there's a common pattern for recursive functions - we need to be able to orient ourselves within a larger problem. To do this, many recursive functions have extra arguments that act as flags when certain conditions are met at a previous or subsequent level.

In the context of this problem, let's think about each possible case of a node and how it relates to that value's potential duplicate status:

at a non-dupe node pointing to a non-dupe node
at a dupe node pointing to a dupe node
at a non-dupe node pointing to a dupe node
at a dupe node pointing to a non-dupe node

At face value, this seems like a problem you could easily solve with a recursive function by only needing think about it going towards the end of the list, without propagating any type of state back to the level above. If we're at the first case above, skip. If we're at the next case, overwrite the current node's pointer. Next case, skip once again. Then the final case, overwrite once more.

However, think about what would happen if you reached the end (nil) with a string of duplicates. Given the list 2 -> 3 -> 3 -> nil, let's walk through those cases node by node:

2 -> 3

2 and 3 are not equal, so we wouldn't change anything about the node here, and invoke the function again, this time passing the '3' node as an argument.

3 -> 3
a    b

The values here are equal, so we need to remove the duplicates. Luckily, since linked lists contain pointers, we can mutate the original list even without direct access to the head node.

In this case, we can overwrite the 'a' pointer to be the 'b' pointer, essentially deleting it. We now invoke the function again, passing in the 'b' node.

3 -> nil
b

Once we reach nil with a duplicate, we have to try to overwrite the pointer, but we can't - Go will panic since we cannot assign nil to a pointer. So, we would have some type of logic to catch this before it panics, and just return nil rather than overwrite to nil. Yet, if we return nil, we've now bubbled back up a level and not only lost context as to whether or not that value was a duplicate, but also we weren't able to mutate the value up the recursion chain since we didn't modify the pointer directly.

To solve this, what we can do instead is that in addition to relying on the duplicate flag we send down the recursion change, let's also create a flag that lets us know if we've bubbled back up from a duplicate value adjacent to nil. This way, we can keep the catch for reaching nil, and still safely overwrite pointers. And if we've reached a level with that flag being true, we rewire the nodes and set it to false, letting us know we've no longer got a duplicate at the end.

So, let's write down our approach in a step by step fashion, and implement along the way.

We need two arguments: a node and a changed flag to let us know if we've mutated a node on the way down the recursion chain.

func deDupe(node *LinkedList, changed bool) {}

We also need a boolean flag indicating whether or not we've reached nil with a dupe.

func deDupe(node *LinkedList, changed bool) bool {}

Our base case is going to be when the next node is nil. Let's also set up a value for if we changed a node at this specific recursion level.

...
var didChange bool

if node.Next == nil {
...
} else {
...
}
...

Once we've reached nil, we need to know if we've come from a duplicate node. This is where we can utilize our changed parameter and simply return that up the chain.

...
if node.Next == nil {
    return changed
} else {
...
}
...

If our next node isn't nil, that means we can directly mutate pointers. So within our else branch, let's set up any additional logic we'll need. We know we want to compare duplicates, so we add a condition comparing our current and subsequent node values. Now, there is a very important distinction between simply re-assigning node and re-assigning the pointer (*node) for node. If we re-assign node, we aren't directly mutating the original list. This is why we need our changed boolean on the way down - if we came from a duplicate value, we should also overwrite this one since we're currently at a duplicate value, but we don't set changed to be true since we'd be at the last of the duplicate values. If we didn't come from a duplicate value, we'll just re-assign node to node.Next since node is what gets passed through the function.

...
} else {
    if node.Val != node.Next.Val {
        if changed {
            *node = *node.Next
        } else {
            node = node.Next
        }
    } else {
        *node = *node.Next
        didChange = true
    }
}
...

So, we're just about done with our recursive function. Now we just need to invoke it. Since we're mutating values on the way down, we really only care about the boolean bubbling up.

...
lastIsDupe := deDupe(node, didChange)
...

This will let us know if the final value in the list is a duplicate, thus notifying us if we should redirect the current level's node.Next node. We also need to include a check for if the next node is nil, since after calling the recursive function our node that's been passed in may have been mutated before we've properly reset the lastIsDupe flag.

...
if lastIsDupe && node.Next != nil {
    node.Next = nil
    lastIsDupe = false
}

return lastIsDupe
...

And, finally, we return our lastIsDupe flag. If this is true as we bubble up our recursion levels, we will hit the previous node and assign the next node to nil, thus removing the final node, which was a duplicate. Then resetting the flag to false will let the levels above know that we no longer have duplicate at the end.

So, let's put everything together now, along with the necessary code for the original function that includes the top-level invocation (note that the function signature has changed since LeetCode calls their linked lists struct a ListNode):

func deleteDuplicates(head *ListNode) *ListNode {
    // by default these cases contain no dupes
    if head == nil || head.Next == nil {
        return head
    }

    lastIsDupe := deDupe(head, false)

    // the reasoning behind this condition
    // is the same as in the recursive fn
    if lastIsDupe && head.Next != nil {
        head.Next = nil
    } else if lastIsDupe {
        return nil
    }

    return head
}

func deDupe(node *ListNode, changed bool) bool {
    var didChange bool

    if node.Next == nil {
        return changed
    } else {
        if node.Val != node.Next.Val {
            if changed {
                *node = *node.Next
            } else {
                node = node.Next
            }
        } else {
            *node = *node.Next
            didChange = true
        }
    }

    lastIsDupe := deDupe(node, didChange)

    if lastIsDupe && node.Next != nil {
        node.Next = nil
        lastIsDupe = false
    }

    return lastIsDupe
}

We've now reached the end and solved our problem! Keep in mind that this isn't necessarily the best way to solve the problem - just one way. While it is iteratively efficient in the sense of mutating node values in one direction (giving us a time complexity of O(n)), it does take up quite a bit of overhead for OS-level space complexity, since these recursive function calls will keep getting stacked on to the heap.

I hope solving this problem has enlightened some aspects of either linked lists or recursion / dynamic programming for you. While there were definitely some quirks I had to figure out finalizing my implementation, it really improved my understanding of recursion by mapping it to a linked list problem.

Thanks for reading, and as always, enjoy your week!

Graham Robertson

Understanding Recursion Using Linked Lists

Linked Lists

Recursion

Problem

2021: A Year in Review

How To Build a Discord Webhook Provider Using Serverless Cloud

Building Pilot: A Retrospective