So far we've seen how algorithms give us a way to think about solving problems efficiently. In this quiz, we'll work through an example of an applied problem a website with users has to solve. In thinking about possible solutions, we'll think about run-time considerations which we've seen before, as well as memory considerations.
A huge online website currently has exactly 300 million users. When a new user signs up, they have to pick a new username that hasn't been used before. To make this happen, the website needs to be able to check whether a given string is available as a username.
Assume that the computer running these comparisons can run a million comparisons per second. How much time would it take to determine if an unused username is available in the list of millions of usernames using the
is_available function shown in the code below?
users = ['amy', 'bob', 'charlie'] def is_available(username, users): for user in users: if username == user: return False return True print(is_available('amy', users)) #print(is_available('alice', users))
Unfortunately, you find that most users are not willing to wait this amount of time to make a new account, and so you lose out on a lot of business.
However, someone proposes a new implementation for is_available. This implementation runs a binary search through a sorted list of existing usernames. If it is still the case that we can run a million comparisons per second and we have 300 million usernames, approximately how long will it take to confirm that an unused username is available?
users = ['amy', 'bob', 'charlie'] def is_available(username, users): if len(users) == 0: return True elif len(users) == 1: return username != users else: i = int(len(users)/2) if username == users[i]: return False elif username < users[i]: return is_available(username, users[:i]) else: return is_available(username, users[i:]) print(is_available('charlie', users))
However, under the current implementation, users have to submit a form in order to check the availability of their desired username. A product manager suggests running the code "client-side" which means that the user's browser is given everything it needs to determine if a username is available or not.
Under our current solution, this requires sending the user the list of usernames. Why might this be a bad idea?
It turns out there is a way to give the user all the information to determine if a a username is available without sending a huge list! We'll dive into these sorts of more advanced algorithms throughout this course.
One solution is called a bloom filter, and consists of two different parts:
To mark a username as taken, we calculate each of the functions . This gives us indices between and for our list of ones and zeros, and we set the value at each of these indices to one.
To check whether an element is available, we still calculate the output of each function to get indices. If the value in our list is zero at any of these indices, then we know the element was never taken, since we would have marked all of these indices with ones when we added it.
This compresses the data tremendously, and allows for our constraint of client-side computation!
However, the compression is not lossless: it's possible that we inadvertently toggle all the bits that correspond to an available username with other usernames. We'll discuss these sorts of tradeoffs later on this course.