Data Structures

In computer science and programming — as in most analytical disciplines — there are several valid solutions to the same problem. The crux of mastering the fundamentals of computer science is making good design choices; that is, finding the “best” solution.

Computers allow us to store and manipulate information, and to perform computations ranging from simple operations like looking up the price of an item in a store to complex modeling like artificial neural networks.

In Data Structures, you’ll build your basic toolkit for how to organize logic (i.e. algorithms) and information (i.e. data) to efficiently solve seemingly complicated problems.



To start, imagine you want to build a personal Twitter feed. You’ll need to store the tweets in some way (so that the feed has the data it needs to display your tweets). Naively, you could start making variables; i.e.,

myFirstTweet = 'This is my first tweet!'
myNextTweet = 'How does this even work?'

But this would quickly become cumbersome—a programming nightmare, with an unlimited number of variable names and no easy way to relate the tweets to each other. A simple data type—the list—will make this problem much more manageable.



Lists (also known as the array abstract data type) are often implemented by the array data structure.

In short, an array holds an ordered collection of items accessible by an integer index. They’re the simplest way of storing a collection of multiple values.

Imagine you are keeping track of Big Bird’s tweets over time. You create an array, tweets, with 100 elements to store these strings, and each time Big Bird tweets you put the tweet's contents into the earliest unfilled position in the array, starting with 0. If the first few tweets are shown below, what is tweets[3], the item in tweets associated with index 3?

Hint: If you aren't sure about the order, you might want to look at the timestamps on the tweets.



The items in an array can be anything from primitive types such as integers to more complex types like instances of classes.

There are a couple of points which are important to note:

  • In this visual representation of an array, you can see that it has a size of six. Once six values have been added, no additional values can be added until the array is resized or something is taken out.

  • You can also see that each "slot" in the array has an associated number, called an index. By using these numbers, the programmer can directly index into the array to get specific values, which makes them very efficient.



In the Twitter example, we decided to make an array with 100 elements to store the tweets. If you are going to tweet for a 101st101^\text{st} time, what should you do so that all of your tweets are stored? (You can assume the array data structure is defined in a language such that all options given in the choices are possible.)



Often, we don’t simply want to store data—we need to use our data to extract some sort of information to answer a real-world question.

For example, imagine we're running a restaurant and we've stored the total price paid (as a decimal number) by each customer on a given night in an array. How can we find the median amount spent by customers?

Note: For convenience, assume that the array is full and that there are an odd number of customers.



From the last question, we can see that a list implemented with an array has a potential drawback: the data is not necessarily sorted or related in any useful way (other than being mapped to indices).

In later quizzes, we’ll develop algorithms to efficiently sort an array. And throughout this course, we’ll discuss how to evaluate the pros and cons of different data types and structures when solving a given problem. Sometimes, simplicity will win; other times, we’ll need a more powerful but complex approach.

Which of the following measures of central tendency is easily computed given an unsorted array of integers?



So far, we haven’t touched the question of “how is this data actually stored?” In this course, we’ll avoid diving into low-level details like how computer memory works.

That said, a short answer is that the simplest implementation of an array (for example an integer array in C) takes up a very organized block of memory, making it very easy to find a particular value.

A more in-depth explanation of array properties can be found in this wiki page. And in our course on Computer Memory, we take a deep dive into that side of computer science and programming.



We’ve looked at some very simple examples of how data could be stored in a list, implemented as an array. Each of these examples was one dimensional; however, many data sets we might care about have a more useful or natural representation that is two-dimensional or more.

Which of the following data could be more naturally represented by a two-dimensional array than a one-dimensional array?



The image above was created by using a two-dimensional array with dimensions m×nm \times n, meaning that it has mm rows and nn columns.

In the array, the number at each index, 11 or 00, determines the color of the single pixel at that index in the image. What is the value of m+nm+n for the array generating this image? You can assume that we are storing the image with as little whitespace around it as possible.



As we’ve alluded to, the list (even when extended to multiple dimensions) is a fairly simple data type. For more complicated algorithms, we’ll want more complex ways to organize our data.

That said, with some clever algorithms, we can do a lot with lists (and arrays). To dive into these algorithms, we’re going to start with one that seems like it should be trivial: searching for an element. For example, given a list of email addresses, is a certain email address in the list? While a naive approach is to check every element in the list, the investigation into this question is more interesting than we might expect!



Problem Loading...

Note Loading...

Set Loading...