A huge online website currently has exactly 300 million users. When a new user signs up, they have to pick a new username that hasn't been used before. To make this happen, the website needs to be able to check whether a given string is available as a username.

Assume that the computer running these comparisons can run a million comparisons per second. How much time would it take to determine if an **unused** username is available in the list of millions of usernames using the `is_available`

function shown in the code below?

Unfortunately, you find that most users are not willing to wait this amount of time to make a new account, and so you lose out on a lot of business.

However, someone proposes a new implementation for is_available. This implementation runs a binary search through a sorted list of existing usernames. If it is still the case that we can run a million comparisons per second and we have 300 million usernames, approximately how long will it take to confirm that an unused username is available?

However, under the current implementation, users have to submit a form in order to check the availability of their desired username. A product manager suggests running the code "client-side" which means that the user's browser is given everything it needs to determine if a username is available or not.

Under our current solution, this requires sending the user the list of usernames. Why might this be a bad idea?

It turns out there is a way to give the user all the information to determine if a a username is available without sending a huge list! We'll dive into these sorts of more advanced algorithms throughout this course.

One solution is called a bloom filter, and consists of two different parts:

- The first is a list of $n$ zeros and ones, which we initialize with all zeros.
- The second is a list of functions $f_1, \ldots, f_k$ (called
*hash*functions) that map a username to an integer between $0$ and $n-1.$ These integers represent indices in our list of ones and zeros.

To mark a username as taken, we calculate each of the functions $f_1(\text{user}), f_2(\text{user}) \ldots f_k(\text{user})$. This gives us $k$ indices between $0$ and $n-1$ for our list of ones and zeros, and we set the value at each of these indices to one.

To check whether an element is available, we still calculate the output of each function to get $k$ indices. If the value in our list is zero at any of these indices, then we know the element was never taken, since we would have marked all of these indices with ones when we added it.

This compresses the data tremendously, and allows for our constraint of client-side computation!

However, the compression is not lossless: it's possible that we inadvertently toggle all the bits that correspond to an available username with other usernames. We'll discuss these sorts of tradeoffs later on this course.