Computer Science Algorithms

Algorithms in the World

So far we've seen how algorithms give us a way to think about solving problems efficiently. In this quiz, we'll work through an example of an applied problem a website with users has to solve. In thinking about possible solutions, we'll think about run-time considerations which we've seen before, as well as memory considerations.

         

Algorithms in the World

A huge online website currently has exactly 300 million users. When a new user signs up, they have to pick a new username that hasn't been used before. To make this happen, the website needs to be able to check whether a given string is available as a username.

Assume that the computer running these comparisons can run a million comparisons per second. How much time would it take to determine if an unused username is available in the list of millions of usernames using the is_available function shown in the code below?

users = ['amy', 'bob', 'charlie']

def is_available(username, users):
    for user in users:
        if username == user:
            return False
    return True

print(is_available('amy', users))
#print(is_available('alice', users))
You need to be connected to run code

         

Algorithms in the World

Unfortunately, you find that most users are not willing to wait this amount of time to make a new account, and so you lose out on a lot of business.

However, someone proposes a new implementation for is_available. This implementation runs a binary search through a sorted list of existing usernames. If it is still the case that we can run a million comparisons per second and we have 300 million usernames, about how long will it take to confirm that an unused username is available?

users = ['amy', 'bob', 'charlie']

def is_available(username, users):
    if len(users) == 0:
        return True
    elif len(users) == 1:
        return username != users[0]
    else:
        i = int(len(users)/2)
        if username == users[i]:
            return False
        elif username < users[i]:
            return is_available(username, users[:i])
        else:
            return is_available(username, users[i:])
        

print(is_available('charlie', users))
Python 3
You need to be connected to run code

         

Algorithms in the World

However, under the current implementation, users have to submit a form in order to check the availability of their desired username. A product manager suggests running the code "client-side" which means that the user's browser is given everything it needs to determine if a username is available or not.

Under our current solution, this requires sending the user the list of usernames. Why might this be a bad idea?

         

Algorithms in the World

It turns out there is a way to give the user all the information to determine if a a username is available without sending a huge list! We'll dive into these sorts of more advanced algorithms throughout this course.

One solution is called a bloom filter, and consists of two different parts:

  • The first is a list of nn 0s and 1s.
  • The second is a list of functions f1,,fkf_1, \ldots, f_k (called hash functions) that map a username to an integer between 0 and n1.n-1.

When we want to mark a username as taken, we calculate each of the functions f1(user),f_1(\text{user}),\ldots and for each numeric output, we set that bit equal to 1. To check whether an element is available, we can check each of these outputs, and if one or more is equal to 0, the username is available.

This compresses the data tremendously, and allows for our constraint of client-side computation!

However, the compression is not lossless: it's possible that we inadvertently toggle all the bits that correspond to an available username with other usernames. We'll discuss these sorts of tradeoffs later on this course.

         
×

Problem Loading...

Note Loading...

Set Loading...