Stationary Distributions of Markov Chains
A stationary distribution of a Markov chain is a probability distribution that remains unchanged in the Markov chain as time progresses. Typically, it is represented as a row vector \(\pi\) whose entries are probabilities summing to \(1\), and given transition matrix \(\textbf{P}\), it satisfies
\[\pi = \pi \textbf{P}.\]
In other words, \(\pi\) is invariant by the matrix \(\textbf{P}\).
Ergodic Markov chains have a unique stationary distribution, and absorbing Markov chains have stationary distributions with nonzero elements only in absorbing states. The stationary distribution gives information about the stability of a random process and, in certain cases, describes the limiting behavior of the Markov chain.
A sports broadcaster wishes to predict how many Michigan residents prefer University of Michigan teams (known more succinctly as "Michigan") and how many prefer Michigan State teams. She noticed that, year after year, most people stick with their preferred team; however, about \(3\%\) of Michigan fans switch to Michigan State, and about \(5\%\) of Michigan State fans switch to Michigan. However, there is no noticeable difference in the state's population of 10 million's preference at large; in other words, it seems Michigan sports fans have reached a stationary distribution. What might that be?
A reasonable way to approach this problem is to suppose there are \(x\) million Michigan fans and \(y\) million Michigan State fans. The state's population is \(10\) million, so \(x + y = 10\). These numbers do not change each year. It follows that \[\begin{align*} x &= 0.97x + 0.05y \\ y &= 0.03x + 0.95y. \end{align*}\] Rearranging either equation, \(x = \tfrac{5}{3} y\). Since \(x + y = 10\), \(y = \tfrac{3}{8} \cdot 10 = 3.75\) and \(x = 6.25\). So there are \(6.25\) million Michigan fans and \(3.75\) million Michigan state fans. In other words, the stationary distribution is \((0.625, \, 0.375)\). \(_\square\)
Note that the limiting distribution does not depend on the number of fans in the state!
Finding Stationary Distributions
Students of linear algebra may note that the equation \(\pi \textbf{P} = \pi\) looks very similar to the column vector equation \(M v = \lambda v\) for eigenvalues and eigenvectors, with \(\lambda = 1\). In fact, by transposing the matrices, \[ \left(\pi \textbf{P}\right)^T = \pi^T \implies \textbf{P}^T \pi^T = \pi^T. \] In other words, the transposed transition matrix \(\textbf{P}^T\) has eigenvectors with eigenvalue \(1\) that are stationary distributions expressed as column vectors. Therefore, if the eigenvectors of \(\textbf{P}^T\) are known, then so are the stationary distributions of the Markov chain with transition matrix \(\textbf{P}\). In short, the stationary distribution is a left eigenvector (as opposed to the usual right eigenvectors) of the transition matrix.
When there are multiple eigenvectors associated to an eigenvalue of 1, each such eigenvector gives rise to an associated stationary distribution. However, this can only occur when the Markov chain is reducible, i.e. has multiple communicating classes.
In genetics, one method for identifying dominant traits is to pair a specimen with a known hybrid. Their offspring is once again paired with a known hybrid, and so on. In this way, the probability of a particular offspring being purely dominant, purely recessive, or hybrid for the trait is given by the table below.
States Child Dominant Child Hybrid Child Recessive Parent Dominant \(\hspace{1cm} 0.5\) \(\hspace{1cm} 0.5\) \(\hspace{1cm} 0\) Parent Hybrid \(\hspace{1cm} 0.25\) \(\hspace{1cm} 0.5\) \(\hspace{1cm} 0.25\) Parent Recessive \(\hspace{1cm} 0\) \(\hspace{1cm} 0.5\) \(\hspace{1cm} 0.5\) What is a stationary distribution for this Markov chain?
The transition matrix is \[\textbf{P} = \begin{pmatrix} 0.5 & 0.5 & 0 \\ 0.25 & 0.5 & 0.25 \\ 0 & 0.5 & 0.5 \end{pmatrix}.\] The transpose of this matrix has eigenvalues satisfying the equation \[ \det \begin{pmatrix} 0.5 - \lambda & 0.25 & 0 \\ 0.5 & 0.5 - \lambda & 0.5 \\ 0 & 0.25 & 0.5 - \lambda \end{pmatrix} = 0.\] It follows that \((0.5 - \lambda)^3 - 0.125 (0.5 - \lambda) - 0.125 (0.5 - \lambda) = (0.5 - \lambda) (\lambda^2 - \lambda) = 0\). So the eigenvalues are \(\lambda = 0\), \(\lambda = 0.5\), and \(\lambda = 1\). The eigenvalue \(\lambda = 0\) gives rise to the eigenvector \((1,\, -2,\, 1)\), the eigenvalue \(\lambda = 0.5\) gives rise to the eigenvector \((-1,\, 0,\, 1)\), and the eigenvalue \(\lambda = 1\) gives rise to the eigenvector \((1,\, 2,\, 1)\). The only possible candidate for a stationary distribution is the final eigenvector, as all others include negative values.
Then, the stationary distribution must be \(\tfrac{1}{1 + 2 + 1} \cdot (1,\, 2,\, 1) = \left(\tfrac{1}{4},\, \tfrac{1}{2},\, \tfrac{1}{4}\right)\). \(_\square\)
Relation to Limiting Distribution
The limiting distribution of a Markov chain seeks to describe how the process behaves a long time after . For it to exist, the following limit must exist for any states \(i\) and \(j\): \[L_{i,j} = \lim_{n \to \infty} \mathbb{P}(X_n = j \mid X_0 = i).\] Furthermore, for any state \(i\), the following sum must be \(1\): \[\sum_{\text{states }j} \lim_{n \to \infty} \mathbb{P}(X_n = j \mid X_0 = i) = 1.\] This ensures that the numbers obtained do, in fact, constitute a probability distribution. Provided these two conditions are met, then the limiting distribution of a Markov chain with \(X_0 = i\) is the probability distribution given by \(\ell = \left(L_{i,j}\right)_{\text{states }j}\).
For any time-homogeneous Markov chain that is aperiodic and irreducible, \[\lim_{n \to \infty} \textbf{P}^n\] converges to a matrix with all rows identical and equal to \(\pi\). Not all stationary distributions arise this way, however. Some stationary distributions (for instance, certain periodic ones) only satisfy the weaker condition that the average number \(\tfrac{1}{n+1} \cdot H_i^{(n)}\) of times the process is in state \(i\) in the first \(n\) steps approaches the corresponding value of the stationary distribution. That is, if \((x_1, \, x_2, \, \dots,\, x_m)\) is the stationary distribution, then \[\lim_{n \to \infty} \frac{H_i^{(n)}}{n+1} = x_i.\]
Not all stationary distributions are limiting distributions.
Consider the two-state Markov chain with transition matrix \[\textbf{P} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.\] As \(n\) increases, there is no limiting behavior to \(\textbf{P}^n\). In fact, the expression simply alternates between evaluating to \(P\) and \(I\), the identity matrix. However, the system has stationary distribution \(\left(\tfrac{1}{2}, \, \tfrac{1}{2}\right)\), since \[ \left(\tfrac{1}{2}, \, \tfrac{1}{2}\right) \cdot \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = \left(\tfrac{1}{2}, \, \tfrac{1}{2}\right).\] So, not all stationary distributions are limiting distributions. Sometimes no limiting distribution exists! \(_\square\)
For time-homogeneous Markov chains, any limiting distribution is a stationary distribution.
Let the Markov chain have transition matrix \(\textbf{P}\). Then, suppose \[ \lim_{n \to \infty} \textbf{P}^n = \begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix} \pi.\] That is, the limit is an \(n \times n\) matrix with all rows equal to \(\pi\). Then note that \[ \begin{align*} \lim_{n \to \infty} \textbf{P}^n \cdot \textbf{P} &= \begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix} \pi \textbf{P} \\ \lim_{n \to \infty} \textbf{P}^{n+1} &= \begin{pmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{pmatrix} \pi. \end{align*}\] Inspecting one row of the left matrix being multiplied on the right-hand side, it becomes clear that \(\pi \textbf{P} = \pi\). Thus, the limiting distribution is also a stationary distribution. \(_\square\)