Amortized Analysis
Amortized analysis is a method of analyzing the costs associated with a data structure that averages the worst operations out over time. Often, a data structure has one particularly costly operation, but it doesn't get performed very often. That data structure shouldn't be labeled a costly structure just because that one operation, that is seldom performed, is costly.
So, amortized analysis is used to average out the costly operations in the worst case. The worst-case scenario for a data structure is the absolute worst ordering of operations from a cost perspective. Once that ordering is found, then the operations can be averaged.
There are three main types of amortized analysis: aggregate analysis, the accounting method, and the potential method.
Intuition
The intuition behind what amortized analysis is and why it is used is important. Essentially, it boils down to being "fair" to a data structure. One bad operation shouldn't ruin a data structure if the operation is relatively uncommon. More technically, we want to understand how data structures actually perform in practice, and amortized analysis helps us do this by giving us an accurate description of a data structure over time. Simply looking at the worst-case performance per operation can be too pessimistic, and amortized analysis gives us a clearer picture of what's going on.
Let's say you want to make a cake for the bake sale. Cake-making is pretty complex, but it's essentially two main steps:
- Mix batter (fast).
- Bake in an oven (slow, and you can only fit one cake in at a time).
Mixing the batter takes relatively little time when compared with baking. Afterwards, you reflect on the cake-making process. When deciding if it is slow, medium, or fast, you choose medium because you average the two operations—slow and fast—to get medium.
Now let's say you wanted to make 100 cakes. You have two options for how to bake 100 cakes. You can mix the batter for a single cake, bake it, and repeat. Or, you can mix the batter for all 100 cakes, then bake all of them, one after another. Are these methods slow, medium, or fast?
Amortized analysis tells us that these two methods should both be described as "medium", even though you might have to bake 100 cakes sequentially. Even though you might have to work through 100 slow operations in a row, they were preceded by 100 fast operations, so the average is still medium.
Worst-case means that it is not possible to dream up a worse sequence of events. It doesn't make any sense, for instance, to skip the batter mixing operation and simply bake 100 cakes. That would be a slow baking process, but it doesn't make any sense, so it's not worth analyzing. The cake-baking process is a medium process because mixing cake batter and baking the cake have a logical ordering that cannot be reversed.
Aggregate Analysis
In aggregate analysis, there are two steps. First, we must show that a sequence of \(n\) operations takes \(T(n)\) time in the worst case. Then, we show that each operation takes \(\frac{T(n)}n\) time, on average. Therefore, in aggregate analysis, each operation has the same cost. In the previous example of cake-making, both operations would be described as medium, instead of fast and slow.
A common example of aggregate analysis is a modified stack. Stacks are a linear data structure that have two constant-time operations. push(element) puts an element on the top of the stack, and pop() takes the top element off of the stack and returns it. These operations are both constant-time, so a total of \(n\) operations (in any order) will result in \(O(n)\) total time. 
Now, a new operation is added to the stack. multipop(k) will either pop the top \(k\) elements in the stack, or if it runs out of elements before that, it will pop all of the elements in the stack and stop. The pseudo-code for multipop(k) would look like this:
| 1 2 3 4 |  | 
Looking at the pseudo-code, it's easy to see that this is not a constant-time operation. multipop can run for at most \(n \) times, where \(n\) is the size of the stack. So, the worst-case runtime for multipop is \(O(n)\). So, in atypical analysis, that means that \(n\) multipop operations take \(O\big(n^2\big)\) time.
However, that's not actually the case. Think about multipop and what it's actually doing. multipop cannot function unless there's been a push to the stack because it would have nothing to pop off. In fact, any sequence of \(n\) operations of multipop, pop and push can take at most \(O(n)\) time. multipop, the only non-constant-time operation in this stack, can only take \(O(n)\) time if there have also been \(n\) constant-time push operations on the stack. In the very worst case, there are \(n\) constant-time operations and just 1 operation taking \(O(n)\) time. 
For any value of \(n\), any sequence of multipop, pop, and push takes \(O(n)\) time. So, using aggregate analysis, 
\[\frac{T(n)}{n} = \frac{O(n)}{n} = O(1).\]
So, this stack has an amortized cost of \(O(1)\) per operation.
The Accounting Method
The accounting method is aptly named because it borrows ideas and terms from accounting. Here, each operation is assigned a charge, called the amortized cost. Some operations can be charged more or less than they actually cost. If an operation's amortized cost exceeds its actual cost, we assign the difference, called a credit, to specific objects in the data structure. Credit can be used later to help pay for other operations whose amortized cost is less than their actual cost. Credit can never be negative in any sequence of operations.
The amortized cost of an operation is split between an operation's actual cost and credit that is either deposited or used up. Each operation can have a different amortized cost, unlike aggregate analysis. Choosing the amortized cost for each operation is important, but the costs must always be the same for a given operation no matter what sequence of operations, just like for any method of amortized analysis.
Looking back at the modified stack from the previous section, the costs of each operation were
| 1 2 3 |  | 
Multipop's cost will either be \(k\) if \(k\) is less than the number of elements in the stack, or it will be the size of the stack. Assigning amortized costs to those functions, we get
| 1 2 3 |  | 
Here it is worth noting that the amortized cost for multipop is constant, while its actual cost is variable.
The final step is to show that it is possible to pay for any sequence of operations using the amortized costs. It is helpful to do this step using money, so 1 dollar will equate to 1 cost.
If we think of the stack as an actual stack of plates, this becomes more clear. Pushing a plate onto the stack is the act of placing that plate of the top of the stack. Popping it is the act of taking the top plate off. So, when a plate is pushed onto the stack in this example, we pay $1 for the actual cost of the operation, and we are left with $1 of credit. This is because we take the amortized cost for push ($2), subtract the actual cost ($1), and are left with $1. We'll place that dollar on top of the plate we just pushed. So, at any point in time, every plate in the stack has 1$ of credit on it.
The $1 on top of the plate will act as the money needed to pop the plate off. We need $1 to pop the plate off because the amortized cost of popping ($0), minus the actual cost of popping ($1), is -$1. At any point in time, every plate has exactly $1 on top of it which can be used to pop it off the stack.
Multipop uses pop as a subroutine. Calling multipop on the stack costs no money, but the pop subroutine within multipop will use the $1 on top of each plate to remove it.
Because there is always $1 on top of every plate in the stack, credit is never negative. Essentially, this is the same idea that was explored in aggregate analysis. Performing pop or multipop doesn't make any sense until something has been pushed to the stack. There's nothing to pop off! So, the worst-case cost of \(n\) operations is \(O(n)\).
The Potential Method
The potential method is similar to the accounting method. However, instead of thinking about the analysis in terms of cost and credit, the potential method thinks of work already done as potential energy that can pay for later operations. This is similar to how rolling a rock up a hill creates potential energy that then can bring it back down the hill with no effort. Unlike the accounting method, however, potential energy is associated with the data structure as a whole, not with individual operations.
The potential method works as follows: It starts with an initial data structure, \(D_0\). Then \(n\) operations are performed, turning the initial data structure into \(D_1, D_2, D_3, ..., D_n\). \(c_i\) will be the cost associated with the \(i^\text{th}\) operation, and \(D_i\) is the data structure resulting from the \(i^\text{th}\) operation.
\(\Phi\) is the potential function which maps the data structure \(D\) to a number \(\Phi(D)\), the potential associated with that data structure. The amortized cost of the \(i\) operation is defined by
\[a_i = c_i + \Phi(D_i) - \Phi(D_{i-1}).\]
So, that means that over \(n\) operations, the total amortized cost will be
\[\sum_{i=1}^{n}a_i = \sum_{i=1}^{n}\big(c_i + \Phi(D_i) - \Phi(D_{i-1})\big).\]
Because this is a telescoping sum, it equals
\[\sum_{i=1}^{n}c_i + \Phi(D_n) - \Phi(D_0).\]
In this method, it is required that \(\Phi(D_i) \geq \Phi(D_0)\) for all \(i\) to prove that the total amortized cost of \(n\) operations is an upper bound on the actual total cost. A typical way to do this is to defined \(\Phi(D_0) = 0\) and show that \(\Phi(D_i) \geq 0\).
Over the course of the sequence of operations, the \(i^\text{th}\) operation will have a potential difference of \(\Phi(D_i) - \Phi(D_{i-1})\). If this value is positive, then the amortized cost \(a_i\) is an overcharge for this operation, and the potential energy of the data structure will increase. If it is negative, it is an undercharge, and the potential energy of the data structure will decrease.
Let's look back at the modified stack. The potential function chosen will simply be the number of items on the stack. Therefore, before the sequence of operations begins, \(\Phi(D_0) = 0\) because there are no items in the stack. For all future operations, it's clear that \(\Phi(D_i) \geq 0\) because there cannot be a negative number of items in a stack.
Calculating the potential difference for a push operation, we find that
\[\Phi(D_i) - \Phi(D_{i-1}) = (stack.size + 1) - stack.size = 1.\]
So, the amortized cost of the push operation is
\[a_i = c_i + \Phi(D_i) - \Phi(D_{i-1}) = 1 + 1 = 2.\]
What is the potential difference of pop and multipop? What is the amortized cost of pop and multipop?
Calculating the amortized cost for pop and multipop is similar, so only multipop is shown here. The potential difference for a multipop operation that has a parameter of \(k\) is
\[\Phi(D_i) - \Phi(D_{i-1}) = -\min(stack.size, k).\]
So, the amortized cost of the multipop operation is
\[a_i = c_i + \Phi(D_i) - \Phi(D_{i-1}) = \min(stack.size, k) + \big(-\min(stack.size, k)\big) = 0.\]
The same logic concludes that the amortized cost for pop is 0. \(_\square\)
All of these operations have an amortized cost of \(O(1)\), so any sequence of operations of length \(n\) will take \(O(n)\) time. Since it was proven that \(\Phi(D_i) \geq \Phi(D_0)\) for all \(i\), this is a true upper bound. The worst case of \(n\) operations is therefore \(O(n)\).
More Examples
Binary bit counter - aggregate analysis
Another example of aggregate analysis is implementing a \(k\)-bit binary counter. There is an array, \(A\), which holds \(k\) \(k\)-bits, so \(A.length = k\). The pseudocode for the increment function looks like this:
| 1 2 3 4 5 6 7 |  | 
The following table describes \(A\) after 'Increment' has been called a few times.
| Count | A[4] | A[3] | A[2] | A[1] | A[0] | 
| 0 | 0 | 0 | 0 | 0 | 0 | 
| 1 | 0 | 0 | 0 | 0 | 1 | 
| 2 | 0 | 0 | 0 | 1 | 0 | 
| 3 | 0 | 0 | 0 | 1 | 1 | 
| 4 | 0 | 0 | 1 | 0 | 0 | 
| 5 | 0 | 0 | 1 | 0 | 1 | 
| 6 | 0 | 0 | 1 | 1 | 0 | 
| 7 | 0 | 0 | 1 | 1 | 1 | 
| 8 | 0 | 1 | 0 | 0 | 0 | 
| 9 | 0 | 1 | 0 | 0 | 1 | 
At first glance, it seems that the while loop on line 3 of the pseudocode would take \(O(k)\) when the array has all 1's. So, \(n\) operations would take \(O(n \cdot k)\). But, that's not always the case. We can tighten this bound by showing that not all bits flip each time. A[0] does flip each time increment is called. However, A[1] only flips every other time. A[2] flips every \(4^\text{th}\) time, and so on. This means that \(n\) operations only causes A[1] to flip \(\frac{n}2\) times and A[2] to flips \(\frac{n}4\) times. 
We've seen that A[1] is flipped \(\frac{n}2\) times and that A[2] is flipped \(\frac{n}4\) times. For an A of length \(k\), how many bits are flipped for \(n\) increment operations? What does this mean for aggregate analysis?
For each A[i] where \(i\) is {0, 1, ..., k-1}, bit \(i\) is flipped \(\frac{n}{2^i}\). The summation is then\[\sum_{i=0}^{k-1}\frac{n}{2^i} \lt n \cdot \sum_{i=0}^{\infty}\frac{1}{2^i} = 2 \cdot n.\]
So, there are \(2 \cdot n\) flips and this takes \(O(n)\) time. Using aggregate analysis
\[\frac{T(n)}{n} = \frac{O(n)}{n} = O(1).\ _\square\]
Buying envelopes - accounting method
In the earlier example, the push operation for a particular element came endowed with enough credit to pay for its eventual pop. You can think about this process like the process of mailing a letter. Typically, when you mail a letter, you pay for the envelope and postage. Then, it is free to mail the letter. This is an amortization.
| Action | Actual Cost | Amortized Cost | 
| Buying Letter & Postage | $1 | $2 | 
| Mailing Letter | $1 | $0 | 
It is essentially the same thing. In amortized analysis, the first act, buying the letter and its postage, pays for the eventual sending of that letter. This relationship is the exact same as the push operation paying for the pop operation. It's much simpler to think about this envelope process in an amortized sense; it's one less thing on your balance sheet.