Sets (ADT)
Sets are a type of abstract data type that allows you to store a list of non-repeated values. Their name derives from the mathematical concept of finite sets.
Unlike an array, sets are unordered and unindexed. You can think about sets as a room full of people you know. They can move around the room, changing order, without altering the set of people in that room. Plus, there are no duplicate people (unless you know someone who has cloned themselves). These are the two properties of a set: the data is unordered and it is not duplicated.
Sets have the most impact in mathematical set theory. These theories are used in many kinds of proofs, structures, and abstract algebra. Creating relations from different sets and codomains are also an important applications of sets.
In computer science, set theory is useful if you need to collect data and do not care about their multiplcity or their order. As we've seen on this page, hash tables and sets are very related. In databases, especially for relational databases, sets are very useful. There are many commands that finds unions, intersections, and differences of different tables and sets of data.
Contents
Minimal Required Functionalities
The set has four basic operations.
Function Name* Provided Functionality insert(i)
Adds i to the set remove(i)
Removes i from the set size()
Returns the size of the set contains(i)
Returns whether or not the set contains i Sometimes, operations are implemented that allow interactions between two sets
Function Name* Provided Functionality union(S, T)
Returns the union of set S and set T intersection(S, T)
Returns the intersection of set S and set T difference(S, T)
Returns the difference of set S and set T subset(S, T)
Returns whether or not set S is a subset of set T *Exact names are not required. In fact, some functionality may be provided by a given language directly via special syntax.
Sample Python Implementation Using a Dictionary
The only way to adequately implement a set in python is by using a dictionary, or a hash table. This is because the dictionary is the only primitive data structure whose elements are unordered. This require a few more lines of code, but it keeps the sets' principles intact.
Note: Python also has its own Set primitive, but we want to implement our own to show how it works
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
This code is a little more complex than other data types like the associative array, so let's break it down.
On line 2, we instantiate our class using a python dictionary as our data structure. For more information on dictionaries, check out this page. We use a dictionary to keep the set unordered (dictionaries are unordered because they are implemented using a hash table ).
The functions insert
, remove
, size
, and contains
all simply search through the dictionary's list of keys to determine how to proceed with the operation.
The functions union
, intersection
, difference
, and subset
all utilize difference pythonic set operators. For example, in the union
function, we first turn both Set's data into a set()
. Then we say we'll take data values in either set and put them in a new set. We do that using the OR operator. This looks like a |
. Then, we change the set()
into a list()
because that is what our class takes as an argument. Then we create a new instance of our Set
, and we return it. There are many other operators such as &
and <
. If you'd like more information on these, you can check out this page.
We created our own version of the __repr__
function on line 39 so that we can have easy readability for our class. Remember that this class uses a dictionary as its data structure, so when we print it our normally, it would look something like this: {1: 1, 2: 2, 3: 3}
. Instead, we just it to print something that looks like a list, so we tell it to only print the keys of this dictonary. This results in something that looks like this [1, 2, 3]
.
Example Problems with Sets
Let's take the same implementation of a Set we had above. Try answering the following questions:
1) What will be the output of the following?
1 2 |
|
This set was instantiated with multiple values of 1 and 4. So, those will be removed
1 2>>> set [1, 2, 3, 4]
2) What will be the output of the following?
1 2 3 |
|
We're taking the union of both sets, so we want each element from both sets, but we exclude duplicate because we're using sets.
1 2>>> set1.union(set2) [1, 2, 3, 4, 5, 7]
3) What will be the output of the following?
1 2 3 |
|
To take the intersection from both sets, we want to return one copy of each element that exist in both sets.
1 2>>> set1.intersection(set2) [3, 5]
4) What will be the output of the following?
1 2 3 |
|
We're asking if set1 is a subset of set2. Set1 has the value 1 in it, and set2 doesn't. So, it is false.
1 2>>> set1.subset(set2) False
Built-in Set Functionality in Python
Now let's look at the built-in set primitive in Python. This is not the same implementation that we used above. This is data structure provided to you by Python itself. It is very similar to our implementation, but it is far more efficient and has many more functions.
Here are the basic functions:
1 2 3 4 5 6 7 |
|
Here are functions that relate sets to one another:
1 2 3 4 5 6 7 8 |
|
There are many more operations that Python's set can use. For more, check out this page.
Time Complexity
As with all abstract data types, there are many ways to implement a Set. A hash table is the most common data structure, so we'll examine its time complexity below:
\[\begin{array} &&\text{Hash Table Insert- O(1),} &\text{Hash Table Remove - O(1),} \\ &\text{Hash Table Contains - O(1)}\\ \end{array}\]
Unlike the associative array, we don't have to worry about how we handle collisions in this hash table implementation. This is because we don't handle them ever! If there is a collision, all that means is that value is already there and we don't have to do anything! This makes the hash table implementation of the set the typical choice of programmers.