Associative Arrays
Associative arrays, also called maps or dictionaries, are an abstract data type that can hold data in (key, value) pairs.
You can think of associative arrays like a list of phone numbers. In this list, you can look up a person's name by finding their phone number. The name is the value and the number is the key. This list would look like the following table:
Phone Number\(\hspace{10mm}\) | Name |
1-203-456-4657 | John Doe |
1-964-725-5617 | Jane Smith |
1-275-486-8562 | Daniel Brown |
1-347-374-3412 | John Doe |
Associative arrays have two important properties. Every key can only appear once, just like every phone number can only appear once in a directory. And, every key can only have one value, just like every phone number can only refer to one person. It is possible that two people can have the same name in this list, so it's important to remember that a given value can appear more than once in an associative array.
Contents
Minimal Required Functionalities
The associative array has four basic operations.
Function Name* Provided Functionality insert(key, value)
\(\hspace{10mm}\)Add a (key, value) pair to the associative array remove(key)
Remove the key's pair in the associative array update(key, value)
Assigns/updates the value to/of the existing key lookup(key)
Returns the value assigned to this key *Exact names are not required. In fact, some functionality may be provided by a given language directly via special syntax.
Sample Python Implementation Using a List
In Python, associative arrays are implemented for you using the dictionary. This primitive data type is implemented using a hash table. This example uses an array, and it is worth noting that it is a very inefficient implementation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Built-in Python Functionality
Python has its own primitive data type, the dictionary, that implements the functionality of the associative array. Let's see how it works:
The Python dictionary
1 2 3>>> my_dict = {} >>> a {}
Now we can add to this dictionary by simply assigning values to keys using a certain syntax. Let's mimic the social security number example from earlier.
1 2 3 4 5>>> a[100] = 'Taylor' >>> a[101] = 'John' >>> a[102] = 'Abby' >>> a {100: 'Taylor', 101: 'John', 102: 'Abby'}
Now, we're allowed to add a different key whose value is Taylor because that doesn't violate any principles of the associative array.
1 2 3>>> a[103] = 'Taylor' >>> a {100: 'Taylor', 101: 'John', 102: 'Abby', 103: 'Taylor'}
In Python, if we assign a value to a key that is already in the dictionary, it will update that key with the new value.
1 2 3>>> a[101] = 'Jennifer' >>> a {100: 'Taylor', 101: 'Jeniffer', 102: 'Abby', 103: 'Taylor'}
We can remove a key using the pop method
1 2 3 4>>> a.pop(103) 'Taylor' >>> a {100: 'Taylor', 101: 'Jeniffer', 102: 'Abby'}
Finally, Python has some cool built-in methods for the dictionary. Here are a few:
1 2 3 4 5 6>>> a.keys() [100, 101, 102] >>> a.values() ['Taylor', 'Jeniffer', 'Abby'] >>> len(a) 3
Example Question
Imagine we have a list of phone numbers and the names of their owners. Answer the following questions:
- 'Jane Doe' is given a new phone number. Unfortunately, this name already exists in the list. What do we do?
- 'James Brown' gets rid of his phone number. Can we use his old phone number for the next person who gets a phone?
- Is the update function allowed in this scenario?
- A1. It's okay to map a different phone number to this name. After all, multiple instances of the same value are allowed in an associative array.
- A2. Yes, this is okay because we've had to remove that (key, value) pair from the list.
- A3. Sure, we can update someone's phone number as long as it's not being used by someone else.
Time Complexity
The specific implementation of the associative array has a massive impact on its runtime.
Operation\(\hspace{10mm}\) | Array\(\hspace{10mm}\) | Red-Black Tree\(\hspace{10mm}\) | Hash Table |
insert | \(O(n)\) | \(O\big(\log(n)\big)\) | \(O(1)\)** |
remove | \(O(n)\) | \(O\big(\log(n)\big)\) | \(O(1)\)** |
update | \(O(n)\) | \(O\big(\log(n)\big)\) | \(O(1)\)** |
lookup | \(O(n)\) | \(O\big(\log(n)\big)\) | \(O(1)\)** |
**Depends on collision handling.
Implementations that use an array, like the one above, are very inefficient. To do any operation at all, you need to iterate through every pair in the array in the worst case. For that reason, associative arrays are rarely, if ever, implemented using arrays.
Red-black trees are much faster than arrays. Because they are binary search trees, they can be searched through in \(O(\log n)\) time. This makes every operation faster when compared with the array.
Hash tables are the reason that associative arrays are so popular in computer science. By introducing a hash function, the implementer can guarantee that the same key will hash to the same value every time. Then, lookup is fast because you just have to look in the correct cell to find the value. It is important to note that collisions can happen when two different keys hash to the same value. There are different ways to resolve these collisions (which all have varying effects on runtime), but collisions are generally undesirable.
Applications of Associative Arrays
Associative arrays are used in a wide range of computer science areas. They are often used in programming patterns and paradigms such as memoization. This technique emphasizes the storing and quick retrieval of information, and the memo is often implemented using a hash table.
All objects in JavaScript are actually associative arrays. JSON, a popular web communication standard, is also an associative array. JSON has gained popularity recently with the rise of JavaScript-based web platforms such as node.js.
An entire type of database called NoSQL databases are based on associative arrays. These databases use associative arrays to store (key, value) pairs. This type of database specializes in fast data retrieval. Exactly like the hash tables.