A greedy algorithm is a simple, intuitive algorithm that is used in optimization problems. The algorithm makes the optimal choice at each step as it attempts to find the overall optimal way to solve the entire problem. Greedy algorithms are quite successful in some problems, such as Huffman encoding which is used to compress data, or Dijkstra's algorithm, which is used to find the shortest path through a graph.
However, in many problems, a greedy strategy does not produce an optimal solution. For example, in the animation below, the greedy algorithm seeks to find the path with the largest sum. It does this by selecting the largest available number at each step. The greedy algorithm fails to find the largest sum, however, because it makes decisions based only on the information it has at any one step, without regard to the overall problem.
Structure of a Greedy Algorithm
Greedy algorithms take all of the data in a particular problem, and then set a rule for which elements to add to the solution at each step of the algorithm. In the animation above, the set of data is all of the numbers in the graph, and the rule was to select the largest number available at each level of the graph. The solution that the algorithm builds is the sum of all of those choices.
If both of the properties below are true, a greedy algorithm can be used to solve the problem.
- Greedy choice property: A global (overall) optimal solution can be reached by choosing the optimal choice at each step.
- Optimal substructure: A problem has an optimal substructure if an optimal solution to the entire problem contains the optimal solutions to the sub-problems.
In other words, greedy algorithms work on problems for which it is true that, at every step, there is a choice that is optimal for the problem up to that step, and after the last step, the algorithm produces the optimal solution of the complete problem.
To make a greedy algorithm, identify an optimal substructure or subproblem in the problem. Then, determine what the solution will include (for example, the largest sum, the shortest path, etc.). Create some sort of iterative way to go through all of the subproblems and build a solution.
Sometimes greedy algorithms fail to find the globally optimal solution because they do not consider all the data. The choice made by a greedy algorithm may depend on choices it has made so far, but it is not aware of future choices it could make.
In the graph below, a greedy algorithm is trying to find the longest path through the graph (the number inside each node contributes to a total length). To do this, it selects the largest number at each step of the algorithm. With a quick visual inspection of the graph, it is clear that this algorithm will not arrive at the correct solution. What is the correct solution? Why is a greedy algorithm ill-suited for this problem?
The correct solution for the longest path through the graph is \(7, 3, 1, 99\). This is clear to us because we can see that no other combination of nodes will come close to a sum of \(99\), so whatever path we choose, we know it should have \(99\) in the path. There is only one option that includes \(99\): \(7, 3, 1, 99\).
The greedy algorithm fails to solve this problem because it makes decisions purely based on what the best answer at the time is: at each step it did choose the largest number. However, since there could be some huge number that the algorithm hasn't seen yet, it could end up selecting a path that does not include the huge number. The solutions to the subproblems for finding the largest sum or longest path do not necessarily appear in the solution to the total problem. The optimal substructure and greedy choice properties don't hold in this type of problem. \(_\square\)
Here, we will look at one form of the knapsack problem. The knapsack problem involves deciding which subset of items you should take from a set of items if you want to optimize some value: perhaps the worth of the items, the size of the items, or the ratio of worth to size.
In this problem, we will assume that we can either take an item or leave it (we cannot take a fractional part of an item). We will also assume that there is only one of each item. Our knapsack has a fixed size, and we want to optimize the worth of the items we take, so we must choose the items we take with care.
Our knapsack can hold at most 25 units of space.
Here is the list of items and their worths.
Item Size Price Laptop 22 12 PlayStation 10 9 Textbook 9 9 Basketball 7 6
Which items do we choose to optimize for price?
There are two greedy algorithms we could propose to solve this. One has a rule that selects the item with the largest price at each step, and the other has a rule that selects the smallest sized item at each step.
- Largest-price Algorithm: At the first step, we take the laptop. We gain \(12\) units of worth, but can now only carry \(25-22 = 3\) units of additional space in the knapsack. Since no items that remain will fit into the bag, we can only take the laptop and have a total of \(12\) units of worth.
- Smallest-sized-item Algorithm: At the first step, we will take the smallest-sized item: the basketball. This gives us \(6\) units of worth, and leaves us with \(25-7 = 18\) units of space in our bag. Next, we select the next smallest item, the textbook. This gives us a total of \(6+9 =15\) units of worth, and leaves us with \(18-9 = 9\) units of space. Since no remaining items are \(9\) units of space or less, we can take no more items.
The greedy algorithms yield solutions that give us \(12\) units of worth and \(15\) units of worth. But neither of these are the optimal solution. Inspect the table yourself and see if you can determine a better selection of items.
Taking the textbook and the PlayStation yields \(9+9=18\) units of worth and takes up \(10+9=19\) units of space. This is the optimal answer, and we can see that a greedy algorithm will not solve the knapsack problem since the greedy choice and optimal substructure properties do not hold. \(_\square\)
In problems where greedy algorithms fail, dynamic programming might be a better approach.
There are many applications of greedy algorithms. Below is a brief explanation of the greedy nature of a famous graph search algorithm, Dijkstra's algorithm.
Dijkstra's algorithm is used to find the shortest path between nodes in a graph. The algorithm maintains a set of unvisited nodes and calculates a tentative distance from a given node to another. If the algorithm finds a shorter way to get to a given node, the path is updated to reflect the shorter distance. This problem has satisfactory optimization substructure since if \(A\) is connected to \(B,\) \(B\) is connected to \(C\), and the path must go through \(A\) and \(B\) to get to the destination \(C\), then the shortest path from \(A\) to \(B\) and the shortest path from \(B\) to \(C\) must be a part of the shortest path from \(A\) to \(C\). So the optimal answers from the subproblems do contribute to the optimal answer for the total problem. This is because the algorithm keeps track of the shortest path possible to any given node.
Huffman encoding is another example of an algorithm where a greedy approach is successful. The Huffman algorithm analyzes a message and depending on the frequencies of the characters used in the message, it assigns a variable-length encoding for each symbol. A more commonly used symbol will have a shorter encoding while a rare symbol will have a longer encoding.
The Huffman coding algorithm takes in information about the frequencies or probabilities of a particular symbol occurring. It begins to build the prefix tree from the bottom up, starting with the two least probable symbols in the list. It takes those symbols and forms a subtree containing them, and then removes the individual symbols from the list. The algorithm sums the probabilities of elements in a subtree and adds the subtree and its probability to the list. Next, the algorithm searches the list and selects the two symbols or subtrees with the smallest probabilities. It uses those to make a new subtree, removes the original subtrees/symbols from the list, and then adds the new subtree and its combined probability to the list. This repeats until there is one tree and all elements have been added. At each subtree, the optimal encoding for each symbol is created and together composes the overall optimal encoding.
For many more applications of greedy algorithms, see the See Also section.
- , S. Greedy-search-path-example. Retrieved May 2, 2016, from https://en.wikipedia.org/wiki/File:Greedy-search-path-example.gif
- , S. Greedy-search-path. Retrieved May 4, 2016, from https://commons.wikimedia.org/wiki/File:Greedy-search-path.gif
- Okie , . Greedy Algorithms. Retrieved May 2, 2016, from http://www.radford.edu/~nokie/classes/360/greedy.html
- , I. Dijkstra Animation. Retrieved May 2, 2016, from https://commons.wikimedia.org/wiki/File:Dijkstra_Animation.gif