Splay Tree
The splay tree is a type of binary search tree. Unlike other variants like the AVL tree, the red-black tree, or the scapegoat tree, the splay tree is not always balanced. Instead, it is optimized so that elements that have been recently acessed are quick to access again.
This property is similar in nature to a stack. A stack has the Last-In-First-Out (LIFO) property, so the most recent item you've added is the quickest one to access. Splay trees are similar in that when you add a new item, it becomes the root of the tree no matter what. But they take it a step further. Even when an item is simply searched for, it becomes the new root of the tree. The following gif shows how a splay tree would insert the elements 7, 3, and 9 in that order.
Because it is an unbalanced binary search tree, the splay tree cannot guarantee worst-case \(O(\log_2(n))\) time, like balanced binary search trees can. Balanced binary search trees have a height that is at most \(\log_2(n)\) where \(n\) is the number of nodes in the tree. So, the splay tree can only guarantee \(O(\log_2(n))\) amortized time.
Contents
Overview
Splay trees are a lot like other binary search trees. They have nodes, and each node has two children, one left and one right. There is a root node that serves at the begining of the tree. The main difference, though, is that the root node is always the last element that was accessed.
If an element is inserted into the tree, the tree will be rotated so that it appears at the root position. Also, if an element is searched for in the tree, it will move the same way.
Deletion is an operation that is largely left up to the implementer. Oftentimes, however, the element to be deleted is either switched with the left-most node in its right subtree or the right-most node in its left subtree. Then it is simply removed from the tree.
Splay trees are often used for inter-tree operations, such as joins, merges, unions, and other set related mathematical operations because splay trees are efficient at these operations. Also, splay trees are used when queries are highly biased. That is, when a set of queries favors a certain element, splay trees are effective. This is because the elements that are queried for frequently will appear towards the top of the tree. For example, if a splay tree is being used to store a set of names of people working at a store, it might be the case that the manager's name is queried for the most. Because of that, the manager's name will spend most of its time at the very top of the tree, making it easy to find again.
Concept Question
Can you think of an order of insertion operations for the set {1, 2, 3, 4} that would make the splay tree extremely inefficient? Hint: It would make the average case of lookup \(O(n)\) like a regular binary search tree.
If we inserted the set {1, 2, 3, 4} in ascending order, the tree would have no branching factor at all. The splay tree would look like this.
1 2 3 4 5 6 74 / 3 / 2 / 1
The same would be true if we inserted them in descending order. The tree would just be mirrored.
Basic Operations
Splay trees support all of the typical binary search tree operations - search, insertion, and deletion. However, it is the sub-operation of splaying the tree that makes all of these operations possible.
Splaying
Splaying is what keeps the splay tree roughly balanced. To splay a node, splaying steps are repeatedly performed on it until it rises to the top. To decide what kind of splaying step to perform, the tree looks at three possibilities:
- The node's parent is the root
- The node is the left child of a right child (or the right child of a left child)
- The node is the left child of a left child (or the right child of a right child)
If the node's parent is the root, we only need one rotation to make it the root. If the node is the left child of the root, we perform a right rotation, and if the node is the right child of the root, we perform a left rotation. This is exactly the same as the rotations in an AVL tree. This is sometimes known as the "zig" case.
If the node is the left child of a right child, we need to perform two rotations. Let N be the node we are trying to splay, P is its parent, and G is its grandparent. It first rotates N and P right, then rotates N and G left. If the node is the right child of a left child, it does the opposite. It first rotates N and P left, then N and G right. This is sometimes referred to as the "zig-zag" case.
If the node is the left child of a left child, there are also two rotations. First, G and P are rotated right. Then X and P are rotated right. If the node is the right child of a right child, G and P are first rotated left followed by X and P being rotated left.
This is an example of the "zig-zag" case. The circles represent nodes in the tree, and the triangles under them represent subtrees. Node 9 is being accessed. First it rotates left with it's parent, 7. Then node 9 rotates right with its original grandparent, 12.
All three cases are used on a node until it becomes the root node (the "zig" case will be the last one performed).
Search
Search is simple once splay is implemented. To search for a node, use binary search down the tree to locate the node. Then, perform splay on that node to bring it to the top of the tree.
Insertion
To insert a node, find its appropriate location at the bottom of the tree using binary search. Then perform splay on that node.
Deletion
Deletion is the only operation that has some wiggle room for the implementer. After all, there's no obvious node to splay when you're removing a node.
A typical implementation is the programmer can switch the node to be deleted with the right-most node in its left subtree or the left-most node in its right subtree. Then the node can be deleted without any consequence to the tree (since it has no children).
Another way is the node to be deleted is first splayed, making it a root node. Then it is deleted. We are left with two seprarate trees that are then joined together using the join operation, which we will see later.
Regardless, a typical implementation will eventually splay the parent of the deleted node. This is similarly up to the discretion of the implementer.
Additional Operations
In addition to search, insertion, and deletion and the splaying that facilitates all three, splay trees also have additional operations. These operations are much faster in a splay tree than in other trees due to the splay operation. They are also one of the reasons splay trees are so attractive for software.
Join
To join two trees, S and T, such that the elements in S are smaller than all the elements in T, two things must happen. First, splay the largest element in S. The results in the root of S being the largest node in S, and it has no right child. Second, set the right child of the root of S to be the root of T. The resulting tree is a binary search tree.
Split
Given a node, splitting a tree at that node results in two trees. One tree has every element that was less than or equal to the node, and the other has every element greater than the node. First, we'll splay the node to the root. Then, we take the right child of the root and make it its own tree. Now there are two trees that satisfy the split condition.
Asymptotic Complexity
Remembering that there are cases where the splay tree is very inefficient, the worst case asymptotic complexity of the splay tree is linear time operations. However, this is frequently overlooked due to amortized analysis, and so splay trees are always considered to have \(O(\log_2(n))\) time operations.
See also: big O notation.
Operation | Average | Worst |
Space | \(O(n)\) | \(O(n)\) |
Search | \(O(\log_2(n))\) | \(O(\log_2(n))\) |
Traversal | *\(O(n)\) | *\(O(n)\) |
Insert | \(O(\log_2(n))\) | \(O(\log_2(n))\) |
Delete | \(O(\log_2(n))\) | \(O(\log_2(n))\) |
*Amortized
Advantages and Disadvantages
The splay tree's main advantage is that it keeps the most queried nodes at the top of the tree, decreasing time for subsequent queries. This locality of reference makes the splay very useful for systems like garbage collection for programming languages or cache systems.
Splay trees also have low memory overhead, similar to scapegoat trees. This makes them attractive for memory-sensitive programs.
The biggest disadvantage of splay trees is that they can be linear, as we observed in the Concept Question. This is extremely bad for performance, though it is also extremely rare for this case to occur.