### Algorithms

So far we've seen how algorithms give us a way to think about solving problems efficiently. In this quiz, we'll work through an example of an applied problem a website with users has to solve. In thinking about possible solutions, we'll think about run-time considerations which we've seen before, as well as memory considerations.

# Algorithms in the World

A huge online website currently has exactly 300 million users. When a new user signs up, they have to pick a new username that hasn't been used before. To make this happen, the website needs to be able to check whether a given string is available as a username.

Assume that the computer running these comparisons can run a million comparisons per second. How much time would it take to determine if an unused username is available in the list of millions of usernames using the is_available function shown in the code below?

users = ['amy', 'bob', 'charlie']

for user in users:
return False
return True

print(is_available('amy', users))
#print(is_available('alice', users))
You need to be connected to run code

# Algorithms in the World

Unfortunately, you find that most users are not willing to wait this amount of time to make a new account, and so you lose out on a lot of business.

However, someone proposes a new implementation for is_available. This implementation runs a binary search through a sorted list of existing usernames. If it is still the case that we can run a million comparisons per second and we have 300 million usernames, approximately how long will it take to confirm that an unused username is available?

users = ['amy', 'bob', 'charlie']

if len(users) == 0:
return True
elif len(users) == 1:
else:
i = int(len(users)/2)
return False
else:

print(is_available('charlie', users))
Python 3
You need to be connected to run code

# Algorithms in the World

However, under the current implementation, users have to submit a form in order to check the availability of their desired username. A product manager suggests running the code "client-side" which means that the user's browser is given everything it needs to determine if a username is available or not.

Under our current solution, this requires sending the user the list of usernames. Why might this be a bad idea?

# Algorithms in the World

It turns out there is a way to give the user all the information to determine if a a username is available without sending a huge list! We'll dive into these sorts of more advanced algorithms throughout this course.

One solution is called a bloom filter, and consists of two different parts:

• The first is a list of $n$ zeros and ones, which we initialize with all zeros.
• The second is a list of functions $f_1, \ldots, f_k$ (called hash functions) that map a username to an integer between $0$ and $n-1.$ These integers represent indices in our list of ones and zeros.

To mark a username as taken, we calculate each of the functions $f_1(\text{user}), f_2(\text{user}) \ldots f_k(\text{user})$. This gives us $k$ indices between $0$ and $n-1$ for our list of ones and zeros, and we set the value at each of these indices to one.

To check whether an element is available, we still calculate the output of each function to get $k$ indices. If the value in our list is zero at any of these indices, then we know the element was never taken, since we would have marked all of these indices with ones when we added it.

This compresses the data tremendously, and allows for our constraint of client-side computation!

However, the compression is not lossless: it's possible that we inadvertently toggle all the bits that correspond to an available username with other usernames. We'll discuss these sorts of tradeoffs later on this course.

×