Let's Learn Algorithms: Implementing Binary Search

In we went over how the binary search algorithm works while walking through an example with a sorted list of integers.

In this article we are going to take the next step in learning about the algorithm by implementing the algorithm.

Our code is going to use the same sorted list as we did in the last article, so it should be pretty easy to reference the last article if anything is confusing. Our first implementation is also going to be iterative, but if you are interested in seeing a recursive solution just email me! jon@calhoun.io

In a future article we will start to discuss some practice problems, many of which don’t use lists, but instead have you using a binary search in other ways. I highly recommend checking those posts out because they help illustrate just how versatile a binary search can be.

I mentioned this in the introduction to the Let’s Learn Algorithms series, but I will be writing my solutions in Go.

I will intentionally be trying to use code that is easy to understand, so if you are familiar with Go and find yourself thinking “he could have just done x” you are probably right, but I don’t want to do anything too fancy that a non-Go developer wouldn’t know about :)

Implementing binary search with a sorted list of integers

When we first discussed binary search we used the sorted list below, and we were searching for the index of the number 6 in that list.

Sorted list of numbers from the intro article

We are going to use those same values as we develop our code, so let’s go ahead and get the boiler plate code out of the way. The code below is simply constructing a list of numbers along with a number to look for, printing those out, and then calling the binarySearch() function that is stubbed out.

package main

import "fmt"

func main() {
  var lookingFor int = 6
  var sortedList []int = []int{1, 3, 4, 6, 7, 9, 10, 11, 13}
  fmt.Println("Looking for", lookingFor, "in the sorted list:", sortedList)

  index := binarySearch(sortedList, lookingFor)
  if index >= 0 {
    fmt.Println("Found the number", lookingFor, "at:", index)
  } else {
    fmt.Println("Didn't find the number", lookingFor, ":(")
  }
}

func binarySearch(sortedList []int, lookingFor int) int {
  return -1
}

Our binary search function is going to return an integer that represents the index of where that value is in the list. If the value wasn’t found in the list, it will instead return a -1 value.

Negative numbers are not valid indexes, so anyone who uses our binary search function will have to first check to verify that they received a positive number before trying to access the number at that index, but the beauty of returning a negative number is that it is a pretty clear way of indicating that we didn’t find an index without needing two return values.

Check the middle value

Now that we have our starting code, we are ready to start implementing our binary search algorithm. The first thing we discussed was finding the middle value in the list and asking ourselves “Is this a 6?”. So how do we find the middle value?

The simplest way is to take the length of the list and divide it by two. As long as we are using integers, we should get an integer back, so we don’t even need to worry about rounding. Integer math will automatically use the floor of the operation.

func binarySearch(sortedList []int, lookingFor int) int {
  var mid int = len(sortedList) / 2
  fmt.Println("Middle value is:", sortedList[mid])
  return -1
}

We have a way of finding the middle value, so our next step is to find out if this value is equal to the value we are looking for, greater than it, or less than it.

Let’s just go ahead and try to write something up that does this, and if we happen to get a match we can return the current index we are checking.

func binarySearch(sortedList []int, lookingFor int) int {
  var mid int = len(sortedList) / 2
  var midValue int = sortedList[mid]
  fmt.Println("Middle value is:", midValue)

  if midValue == lookingFor {
    return mid
  } else if midValue > lookingFor {
    // We want to use the left half of our list
  } else {
    // We want to use the right half of our list
  }
  return -1
}

There are a few issues with this code that we are going to address in due time, but for now it appears to do roughly what we wanted, so let’s focus on filling in the remaining gaps, which are actually pretty similar.

Representing sub-lists

When we don’t find the value we are looking for, we want to try again using either the left half or the right half of the list.

One option for achieving this is to take our list and copy it into a new, smaller list. In reality, this approach works fine, but it is less useful when we start using binary searches for searching things that aren’t sorted lists, so instead I am going to ask you to think about this another way.

Instead of trying to create a new, smaller list, is there some way that we can tell our code to only use a small part of our existing list?

This might not be as intuitive at first, but one way to achieve this is to keep track of the starting and ending point of your smaller list. For example, if I told you “I only care about the sub-list between indexes 2 and 6 ” you would know that I really care about the numbers [4, 6, 7, 9, 10].

We can do something similar in our code. Instead of trying to break our list into actual smaller lists, we are going to keep track of the minimum index for our smaller list, and the maximum index for our smaller list.

Unfortunately, this means that some of the code we already wrote won’t be correct anymore. If I tell you “I only care about the numbers between indexes 2 and 6” then we can’t calculate the middle index by taking the length of the entire list and dividing it in half. That would always get us the same index regardless of the size of our list.

Instead, what we really want to do is calculate the middle index of our sub-list. So what is in the middle of 2 and 6?

First let’s declare a few variables to make things clear. Assume that our lowest index, 2, is represented by a variable named lo, and our highest index, 6, is represented by a variable named hi.

To calculate the index in between these two, we first need to figure out how many numbers are between the two indexes. This is simply hi - lo.

Next we need to divide the total numbers between lo and hi in half to get the middle “index”. That is (hi - lo) / 2.

I used quotes around “index” in the last equation because it isn’t actually our true index. For example, if we go back to your lo = 2, hi = 6 example, (6 - 2) / 2 = 2, but 2 isn’t the middle index between 2 and 6. What is going wrong?

When we divide the length (hi - lo) by two, we are getting the middle index assuming the list started at index 0. Our list may not start at index 0, but we do know where it starts, so we simply need to add our starting index (lo) to the value we calculated above to get the true middle index.

This gives us the final equation of mid = lo + (hi - lo) / 2 to calculate the middle index of our sub-list.

Putting this all into code, we get the following updates to our binarySearch() function:

func binarySearch(sortedList []int, lookingFor int) int {
  var lo int = 0
  var hi int = len(sortedList) - 1

  var mid int = lo + (hi-lo)/2
  var midValue int = sortedList[mid]
  fmt.Println("Middle value is:", midValue)

  if midValue == lookingFor {
    return mid
  } else if midValue > lookingFor {
    // We want to use the left half of our list
  } else {
    // We want to use the right half of our list
  }
  return -1
}

When our list first comes in we are going to set lo and hi to be 0 and the length of the list minus one. We subtract one from the length of the list for the hi value because that gives us an inclusive value for hi. That is, our list includes the numbers at index lo and hi, so we would say that sortedList[hi] is part of our sublist.

Actually creating sub-lists

Now that we have a way to represent sub-lists in our code, we are now free to actually start creating them when our code doesn’t find the value that we are looking for.

To make this a little clearer, we are going to continue following along with our original example. Our initial values are:

sortedList = [1, 3, 4, 6, 7, 9, 10, 11, 13]
lookingFor = 6
lo = 0
hi = 8
mid = 0 + (8 - 0)/2 = 4
midValue = 7

Head over to the first else block; The one where value > lookingFor. The first time we our code gets here we will be comparing a midValue of 7 with a lookingFor value of 6, so this will be true. That means that we want to use the “left” half of our list, but how do we create that with our lo, hi, and mid values?

The easiest way to figure this out is to look at our example from the last article.

6 must be in the first half

What would our new lo and hi values be if we wanted to just look at the left list? Well, lo wouldn’t change at all, but hi would be come 3, or put another way, hi would become mid - 1.

hi becomes mid - 1 because we know that whatever value is at mid isn’t valid, otherwise we would have returned it.

Now let’s look at the opposite use case. What would happen if we instead wanted to use the right half of the list? The numbers [9, 10, 11, 13] would be represented by lo = 5, hi = 8, so it looks like in this case we just need to change lo to mid + 1. Once again, we add 1 because we know that the value at mid isn’t the correct value.

If we put that into code, our binarySearch() function should now look like the one below.

func binarySearch(sortedList []int, lookingFor int) int {
  var lo int = 0
  var hi int = len(sortedList) - 1

  var mid int = lo + (hi-lo)/2
  var midValue int = sortedList[mid]
  fmt.Println("Middle value is:", midValue)

  if midValue == lookingFor {
    return mid
  } else if midValue > lookingFor {
    // We want to use the left half of our list
    hi = mid - 1
  } else {
    // We want to use the right half of our list
    lo = mid + 1
  }
  return -1
}

Repeat until we can’t

Our code still has one major problem; After we update lo or hi we simply return -1! That isn’t right!

What we really want to do tell our code to repeat itself with the new lo and hi values. Doing that is pretty easy with a loop, but what is our exit criteria?

This part of the code is a little less intuitive, but our exit criteria is basically going to be whenever we get an invalid sub-list. That is, whenever lo is greater than hi, we have created a sub-list with a negative length. That isn’t right!

for lo <= hi {
  ...
}

There are a couple reasons why this works, but the biggest contributing factor is that as long as lo is less than or equal to hi, the pair represents a valid list with at least one item in it, and the only way for lo to become greater than hi is if we try to look at a half of the list that doesn’t exist.

This is much easier to explain with a few examples. Specifically, we are going to look at examples where the list size is 3+, 2, and 1 item.

Three or more items in our list

We will start with a list of just three items like below.

sortedList = [7, 10, 12]
lo = 0 // lowest index
hi = 2 // highest index
mid = lo + (hi - lo) / 2 = 1

When run our code with this example, we end up doing one of three things.

We return mid
We set hi to 0
We set lo to 2

In all three cases, we end up creating a valid sub-list or we find the value we are looking for. It is perfectly fine for lo to equal hi because this means we have a list with just one item in it.

Generally speaking, any list with at least three items in it will follow the pattern above. We will never create an invalid sub-list as long as the list we are working with has three or more items in it because mid will never be equal to lo or hi.

One item in our list

We are going to skip the list with two items for a minute and instead look at a list with just one item in it.

sortedList = [7]
lo = 0 // lowest index
hi = 0 // highest index
mid = lo + (hi - lo) / 2 = 0

When run our code with this example, we end up doing one of three things.

We return mid
We set hi to -1
We set lo to 1

In the first case we find the value we are looking for. In both of the other two cases we are trying to look at the “left half” of our list, or the “right half” of our list, but I said earlier that our list only has one item in it, so it doesn’t actually have a left or right half. It just has one item.

As a result, we end up setting lo to a value that is greater than hi, or we end up seeing hi to a value that is less than lo. Whenever this happens it means that we tried to look at a half of the list that doesn’t exist.

Two items in our list

Finally, we are going to look at a list with two items in it.

sortedList = [7, 10]
lo = 0 // lowest index
hi = 1 // highest index
mid = lo + (hi - lo) / 2 = 0

When run our code with this example, we end up doing one of three things.

We return mid
We set hi to -1
We set lo to 1

In the first case we find the value we are looking for, so again we are safe.

In the third case we create a valid sub-list with just one item in it, and we just look at that use case before so we know that our code can handle it.

In the second case we end up trying to look at the “left half” of our list, but it doesn’t exist because our middle value was the first item in our list. Just like before, we end up setting hi to a value less than lo when we try to look at a sub-list that doesn’t exist.

Back to our binary search

I demonstrated why this loop works by covering all of the possible use cases above. The final thing left to do is to determine what to return when our loop terminates.

It just so happens that we were already returning -1 at the end of our code, and when we try to look at a sub-list that doesn’t exist what this really means is that we didn’t find the value we were looking for. Otherwise we would have already returned it inside of our first block of our if statement.

Putting this all together we get the following code:

func binarySearch(sortedList []int, lookingFor int) int {
  var lo int = 0
  var hi int = len(sortedList) - 1

  for lo <= hi {
    var mid int = lo + (hi-lo)/2
    var midValue int = sortedList[mid]
    fmt.Println("Middle value is:", midValue)

    if midValue == lookingFor {
      return mid
    } else if midValue > lookingFor {
      // We want to use the left half of our list
      hi = mid - 1
    } else {
      // We want to use the right half of our list
      lo = mid + 1
    }
  }

  // If we get here we tried to look at an invalid sub-list
  // which means the number isn't in our list.
  return -1
}

You can run this code on the Go Playground here: https://play.golang.org/p/eyJ5Y_iWRB

Play around with the code a bit. Try updating the lookingFor value and following the code with a pen and paper, then verify that it does what you expected when you actually run the code.

Up Next…

It is important to remember that a binary search can only work on data that is sorted relevant to what you are searching. For instance, if you want to find a the smallest number greater than 6 in a list, you need a sorted list, but if you want to find a commit that broke your code, you only need the commits sorted in the order ..., working, working, broken, broken, .... That is, you need all working commits to come before the breaking commit, and then all commits after the bad commit must be broken.

I plan to eventually release more practice problems that illustrate different ways to apply a binary search and more nuances like this, but unfortunately I have been swamped and it may be a while before they are released. In the meantime, you can continue on with the series and check out the next article - Queues - What are they and how do I implement one in Go?. After that we look at stacks, and then finally get into trees, graph theory, and ways to search trees (BFS and DFS).

This article is part of the series, Let's Learn Algorithms.

Learn Web Development with Go!

Sign up for my mailing list and I'll send you a FREE sample from my course - Web Development with Go. The sample includes 19 screencasts and the first few chapters from the book.

You will also receive emails from me about Go coding techniques, upcoming courses (including FREE ones), and course discounts.

Email Address