Sets (ADT)

Sets are a type of abstract data type that allows you to store a list of non-repeated values. Their name derives from the mathematical concept of finite sets.

Unlike an array, sets are unordered and unindexed. You can think about sets as a room full of people you know. They can move around the room, changing order, without altering the set of people in that room. Plus, there are no duplicate people (unless you know someone who has cloned themselves). These are the two properties of a set: the data is unordered and it is not duplicated.

Sets have the most impact in mathematical set theory. These theories are used in many kinds of proofs, structures, and abstract algebra. Creating relations from different sets and codomains are also an important applications of sets.

In computer science, set theory is useful if you need to collect data and do not care about their multiplcity or their order. As we've seen on this page, hash tables and sets are very related. In databases, especially for relational databases, sets are very useful. There are many commands that finds unions, intersections, and differences of different tables and sets of data.

Minimal Required Functionalities

The set has four basic operations.

Function Name* Provided Functionality

insert(i) Adds i to the set

remove(i) Removes i from the set

size() Returns the size of the set

contains(i) Returns whether or not the set contains i

Sometimes, operations are implemented that allow interactions between two sets

Function Name* Provided Functionality

union(S, T) Returns the union of set S and set T

intersection(S, T) Returns the intersection of set S and set T

difference(S, T) Returns the difference of set S and set T

subset(S, T) Returns whether or not set S is a subset of set T

*Exact names are not required. In fact, some functionality may be provided by a given language directly via special syntax.

Sample Python Implementation Using a Dictionary

The only way to adequately implement a set in python is by using a dictionary, or a hash table. This is because the dictionary is the only primitive data structure whose elements are unordered. This require a few more lines of code, but it keeps the sets' principles intact.

Note: Python also has its own Set primitive, but we want to implement our own to show how it works

class Set:
    def __init__(self, data=None):
        self.data = {}
        if data != None: 
            if len(data) != len(set(data)):
                data = set(data)
            for d in data:
                self.data[d] = d
    def insert(self, i):
        if i in self.data.keys():
            return 'Already in set'
        self.data[i] = i
    def remove(self, i):
        if i not in self.data.keys():
            return 'Not in set'
        self.data.pop(i)
    def size(self):
        return len(self.data.keys())
    def contains(self, i):
        if i in self.data.keys():
            return True
        return False
    def union(self, otherSet):
        setData = list(set(self.data.keys()) | set(otherSet.data.keys()))
        unionSet = Set(setData)
        return unionSet
    def intersection(self, otherSet):
        setData = list(set(self.data.keys()) & set(otherSet.data.keys()))
        intersectionSet = Set(setData)
        return intersectionSet
    def difference(self, otherSet):
        setData = list(set(self.data.keys()) ^ set(otherSet.data.keys()))
        differenceSet = Set(setData)
        return differenceSet
    def subset(self, otherSet):
        if set(self.data.keys()) < set(otherSet.data.keys()):
            return True
        return False
    def __repr__(self):
      return '['+', '.join(str(x) for x in self.data.keys())+']'

This code is a little more complex than other data types like the associative array, so let's break it down.

On line 2, we instantiate our class using a python dictionary as our data structure. For more information on dictionaries, check out this page. We use a dictionary to keep the set unordered (dictionaries are unordered because they are implemented using a hash table ).

The functions insert, remove, size, and contains all simply search through the dictionary's list of keys to determine how to proceed with the operation.

The functions union, intersection, difference, and subset all utilize difference pythonic set operators. For example, in the union function, we first turn both Set's data into a set(). Then we say we'll take data values in either set and put them in a new set. We do that using the OR operator. This looks like a |. Then, we change the set() into a list() because that is what our class takes as an argument. Then we create a new instance of our Set, and we return it. There are many other operators such as & and <. If you'd like more information on these, you can check out this page.

We created our own version of the __repr__ function on line 39 so that we can have easy readability for our class. Remember that this class uses a dictionary as its data structure, so when we print it our normally, it would look something like this: {1: 1, 2: 2, 3: 3}. Instead, we just it to print something that looks like a list, so we tell it to only print the keys of this dictonary. This results in something that looks like this [1, 2, 3].

Example Problems with Sets

Let's take the same implementation of a Set we had above. Try answering the following questions:

1) What will be the output of the following?

>>> set1 = Set([1, 2, 3, 4, 1, 4])
>>> set1

2) What will be the output of the following?

>>> set1 = Set([1, 3, 5, 7])
>>> set2 = Set([2, 3, 4, 5])
>>> set1.union(set2)

3) What will be the output of the following?

>>> set1 = Set([1, 3, 5, 7])
>>> set2 = Set([2, 3, 4, 5])
>>> set1.intersection(set2)

4) What will be the output of the following?

>>> set1 = Set([1, 2, 3])
>>> set2 = Set([2, 3, 4])
>>> set1.subset(set2)

Built-in Set Functionality in Python

Now let's look at the built-in set primitive in Python. This is not the same implementation that we used above. This is data structure provided to you by Python itself. It is very similar to our implementation, but it is far more efficient and has many more functions.

Here are the basic functions:

>>> set1 = set(['Alex', 'Brittany', 'Joyce'])
>>> set1.add('John')
>>> set1
set(['John', 'Alex', 'Brittany', 'Joyce'])
>>> set1.remove('Alex')
>>> set1
set(['John', 'Brittany', 'Joyce'])

Here are functions that relate sets to one another:

>>> odds = set([1, 3, 5, 7, 9])
>>> evens = set([0, 2, 4, 6, 8])
>>> odds | evens                   #Performs a union
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> odds & evens                   #Performs an intersection
set([])
>>> evens - odds                   #Performs a difference
set([0, 8, 2, 4, 6])

There are many more operations that Python's set can use. For more, check out this page.

Time Complexity

As with all abstract data types, there are many ways to implement a Set. A hash table is the most common data structure, so we'll examine its time complexity below:

\[\begin{array} &&\text{Hash Table Insert- O(1),} &\text{Hash Table Remove - O(1),} \\ &\text{Hash Table Contains - O(1)}\\ \end{array}\]

Unlike the associative array, we don't have to worry about how we handle collisions in this hash table implementation. This is because we don't handle them ever! If there is a collision, all that means is that value is already there and we don't have to do anything! This makes the hash table implementation of the set the typical choice of programmers.

Function Name*	Provided Functionality
`insert(i)`	Adds i to the set
`remove(i)`	Removes i from the set
`size()`	Returns the size of the set
`contains(i)`	Returns whether or not the set contains i

Function Name*	Provided Functionality
`union(S, T)`	Returns the union of set S and set T
`intersection(S, T)`	Returns the intersection of set S and set T
`difference(S, T)`	Returns the difference of set S and set T
`subset(S, T)`	Returns whether or not set S is a subset of set T

Contents