Radix Sort
Radix sort is an integer sorting algorithm that sorts data with integer keys by grouping the keys by individual digits that share the same significant position and value (place value). Radix sort uses counting sort as a subroutine to sort an array of numbers. Because integers can be used to represent strings (by hashing the strings to integers), radix sort works on data types other than just integers. Because radix sort is not comparison based, it is not bounded by $\Omega(n \log n)$ for running time — in fact, radix sort can perform in linear time.
Radix sort incorporates the counting sort algorithm so that it can sort larger, multidigit numbers without having to potentially decrease the efficiency by increasing the range of keys the algorithm must sort over (since this might cause a lot of wasted time).
Radix Sort
Radix sort takes in a list of $n$ integers which are in base $b$ (the radix) and so each number has at most $d$ digits where $d = \lfloor(\log_b(k) +1)\rfloor$ and $k$ is the largest number in the list. For example, three digits are needed to represent decimal $104$ (in base 10). It is important that radix sort can work with any base since the running time of the algorithm, $O(d(n+b))$, depends on the base it uses. The algorithm runs in linear time when $b$ and $n$ are of the same size magnitude, so knowing $n$, $b$ can be manipulated to optimize the running time of the algorithm.
What is decimal $22$ in binary (base 2).
Here is a table describing the digits of the decimal base. In the $10^0$th place, there are 1's, in the $10^1$ place there are 10's, and so on.
$10^2$ $10^1$ $10^0$ $100$ $10$ $1$ We are pretty used to the decimal system, so it is easy to see that we need a $2$ in the 10's place and a $2$ in the 1's place to make $22$. Let's make a similar table for binary.
$2^4$ $2^3$ $2^2$ $2^1$ $2^0$ $16$ $8$ $4$ $2$ $1$ To make decimal $22$ in binary, we can add $2^4 + 2^2 + 2^1 = 16 + 4 + 2 = 22$ which corresponds to a $1$ in the 1's, 2's, and 4's places, and a $0$ in all the other places. In binary, decimal $22$ can be represented as $10110$.
Radix sort works by sorting each digit from least significant digit to most significant digit. So in base 10 (the decimal system), radix sort would sort by the digits in the 1's place, then the 10’s place, and so on. To do this, radix sort uses counting sort as a subroutine to sort the digits in each place value. This means that for a threedigit number in base 10, counting sort will be called to sort the 1's place, then it will be called to sort the 10's place, and finally, it will be called to sort the 100's place, resulting in a completely sorted list. Here is a quick refresher on the counting sort algorithm.
Counting Sort Subroutine
Counting sort uses three lists: the input list, $A[0,1, \dots, n]$, the output list, $B[0,1, \dots, n]$, and a list that serves as temporary memory, $C[0,1, \dots, k]$. Note that $A$ and $B$ have $n$ slots (a slot for each element), while $C$ contains $k$ slots (a slot for each key value).
Counting sort starts by going through $A$, and for each element $A[i]$, it goes to the index of $C$ that has the same value as $A[i]$ (so it goes to $C[A[i]]$) and increments the value of $C[A[i]]$ by one. This means that if $A$ has seven $0$’s in its list, after counting sort has gone through all $n$ elements of $A$, the value at $C[0]$ will be $7$. Similarly, if $A$ has two $4$’s, after counting sort has gone through all of the elements of $A$, $C[4]$ (using 0 indexing) will be equal to $2$.
In this step, $C$ keeps track of how many elements in $A$ there are that have the same value of a particular index in $C$. In other words, the indices of $C$ correspond to the values of elements in $A$, and the values in $C$ correspond to the total number of times that a value in $A$ appears in $A$.
Radix sort is a stable sort, which means it preserves the relative order of elements that have the same key value. This is very important.
For example, since the list of numbers $[56,43,51,58]$ will be sorted as $[51,43,56,58]$ when the 1’s place is sorted (since $1 < 3 < 6 < 8$) and on the second pass, when the 10’s place is being sorted, the sort will see that three of the four values are $5$. To preserve the sorting that the algorithm determined while sorting the 1’s place, it is important to maintain relative order (namely $1 < 6 < 8$) between the numbers with the same value in the 10’s place (or whatever place value is currently being sorted). The second pass of the radix sort will produce $[43,51,56,58]$.
Counting sort can only sort one place value of a given base. For example, a counting sort for base10 numbers can only sort digits zero through nine. To sort twodigit numbers, counting sort would need to operate in base100. Radix sort is more powerful because it can sort multidigit numbers without having to search over a wider range of keys (which would happen if the base was larger).
In the image showing radix sort below, notice that each column of numbers (each place value) is sorted by the digit in question before the algorithm moves on to the next place value. This shows how radix sort preserves the relative order of digits with the same value at a given place value — remember, $66$ and $68$ will both appear as $6$'s in the 10's column, but $68 > 66$, so the order determined in the 1's column, that $8 > 6$ must be preserved for the sort to work properly and produce the correct answer.
Implementation of Radix Sort
Here is a Python implementation of radix sort that explicitly uses counting sort as a subroutine (some implementations combine the functions).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 

Complexity of Radix Sort
Radix sort will operate on $n$ $d$digit numbers where each digit can be one of at most $b$ different values (since $b$ is the base being used). For example, in base 10, a digit can be $0,1,2,3,4,5,6,7,8, \text{or } 9$.
Radix sort uses counting sort on each digit. Each pass over $n$ $d$digit numbers will take $O(n + b)$ time, and there are $d$ passes total. Therefore, the total running time of radix sort is $O(d(n+b))$. When $d$ is a constant and $b$ isn't much larger than $n$ (in other words, $b = O(n)$), then radix sort takes linear time.
See Also
References
 Demaine, E., & Devadas, S. Lecture 7: Counting Sort, Radix Sort, Lower Bounds for Sorting. Retrieved May 22, 2016, from http://ocw.mit.edu/courses/electricalengineeringandcomputerscience/6006introductiontoalgorithmsfall2011/lecturevideos/MIT6_006F11_lec07_orig.pdf