Waste less time on Facebook — follow Brilliant.
×

Complexity of Algorithms

                   

One of the first sorting algorithms we will see is a quadratic time algorithm, which runs with runtime \(O(N^2)\). We'll get more into the details of this big O notation later in this quiz, but the \(N^2\) expression should give you an idea of the relative number of steps the algorithm needs to sort an array of \(N\) elements.

Unfortunately, \(O(N^2)\) performance is slow. Really slow. Of course, for small datasets, we don't care as much about efficiency. However, consider datasets that are more substantial: analyzing search queries; predicting protein folding; data-mining polling data. All of these problems are done on large inputs; on the order of millions. So while simple, the quadratic time algorithm is simply too slow compared to faster sorting algorithms that are nearer to \(O\big(N\log(N)\big)\).

To understand how much faster the latter is, consider modern computers that have processors running at gigahertz speeds. Suppose a dataset with \(N = 1000000 \) inputs where the number of operations is \( N^2 .\) Performing 1 billion operations a second on 1 million inputs would take \(\frac{(10^6)^2}{10^9}=1000\) seconds.

However, if the number of operations was \( N \log(N) \) it would take \(\frac{10^6\cdot \log(10^6)}{10^9}<1\) second. While coefficients and other factors are often involved, the difference is still staggering, reducing the runtime from almost 20 minutes to less than a second! Consequently, we must think about the performance of our algorithms as we create them.

Problem: We are given a list of 1000 names that is sorted for security for a show. We are asked whether this list contains a person named Jenny. For a good algorithm, about how many names do you expect to have to check?

For the problem of checking whether an ordered list of size \(N\) contains an element, we saw above that the algorithm of binary search should take around \(\log_2(N)\) steps, whereas the naive algorithm of checking every element of the list would take \(N\) steps. This is how we quantify algorithm: how fast are they? (Or how much memory do they take?)

Let \(f(N)\) and \(g(N)\) both be functions of \(N\). Let \(N_0\in\mathbb{N}\) be any positive integer. If there always exists a scalar \(k\in\mathbb{R}\) such that \(|f(N)|<|kg(N)|\) for all \(N>N_0\), then we say that \(f(N)\) is \(O\big(g(N)\big).\) Equivalently, we say that the order of growth of \(f(N)\) is \(g(N).\)

For example, consider the brute force search of a list for an element. We saw that in the worst case, this looks at all \(N\) elements of the list. Thus the runtime of this algorithm would be \(O(N).\)

With our definition of order of growth, we may now quantify algorithms by statements such as "searching an unordered list is \(O(N)\)" or, in other words, "the order of growth of unordered searching is linear."

A word of caution:

If algorithm A takes \(O\big(f(N)\big)\) time and algorithm B takes \(O\big(g(N)\big)\) time, where \(f(N)>g(N)\) is faster growing, then algorithm B takes less time to run. Thus for runtimes, we want smaller runtimes. What is the order of growth of binary search?

While speed is a significant factor when considering the trade-offs of different algorithms, space is as well. For example, suppose for a given problem, algorithm A has runtime \(O(N^3)\) and algorithm B has runtime \(O(N^3\log{N})\). From speed alone, we may be tempted to use algorithm A. However, suppose algorithm A also uses \(O(N^2)\) memory while B uses \(O(1)\) memory. Depending on our constraints, we must consider all performance factors.

Consider the following algorithm to generate the \(N^\text{th}\) term of the Fibonacci sequence:

1
2
3
4
5
def fib1(n):
        f = [0,1]
        for i in range(2,n+1):
                f.append(f[i-1]+f[i-2])
        return f[n]

What is the order of growth of the memory for running fib1(N) ?

Now consider this slightly different algorithm to generate the \(N^\text{th}\) term of the Fibonacci sequence:

1
2
3
4
5
6
7
def fib2(n):
        a = 0
        b = 1
        for i in range(n):
                b += a
                a = b - a
        return a

What is the order of growth of the memory for running fib2(N)?

NOTE: Assume the code does not automatically create a list of length \(n\) at the statement range(n).

Comparing fib1(N) with fib2(N), we see that the runtimes are essentially the same: \(O(N)\).

But fib1(N) has memory \(O(N)\) while fib2(N) has memory \(O(1)\). It might seem immediately obvious that fib2 is better!

However, there may still be reasons to use an approach like in fib1. For example, we may later have the need to access different Fibonacci numbers multiple times. So, rather than computing them on the fly each time, if we had the values stored as we computed them (as is done in fib1), we would reduce overall computation time!

There is no strict rule on preferring better performance in memory or better performance in speed. It all depends on which is more expensive. For example, cloud providers would want to maximize speed as much as possible. This is why companies such as Amazon, Google, or Microsoft have massive server farms: memory is relatively cheap to them compared to the demand for fast speed and low latency. In contrast, video game developers in the '80s suffered from the problem that game cartridges simply could not store a lot of memory. Thus performance (and even mathematical accuracy of some calculations) were sidelined in favor of efficient memory usage.

As we learn more about algorithms, we will keep these kinds of trade-offs in mind.

×

Problem Loading...

Note Loading...

Set Loading...