Axioms of Probability
In order to compute probabilities, one must restrict themselves to collections of subsets of the arbitrary space \(\Omega\) known as \(\sigma\)-algebras. Due to the Banach-Tarski paradox, it turns out that assigning probability measures to any collection of sets without taking into consideration the set's cardinality will yield contradictions. Thus, a special class of sets must be adhered to in order to correctly define the notion of a probability measure.
Contents
Set Structures
This section will develop specific types of set structures in which we can compute probabilities.
Algebras and \(\sigma\)-algebras
\(\sigma\)-algebras are by far the most important set structure defined here as they are the building blocks for defining probability measures.
A collection, \(\mathcal F\), of subsets of \(\Omega\), is a \(\sigma\)-algebra if
1) \(\mathcal F\) is closed under complements: if \( A \in \) \(\mathcal F\) \(\implies\) \(A^c \in \) \(\mathcal F;\)
2) \(\mathcal F\) is closed under countable unions: if \(A_n \in \) \(\mathcal F\) \(\forall i \in \mathbb N \) \(\implies\) \( \cup_{i \in \mathbb N} A \in \mathcal F. \)
A collection, \(\mathcal A\), of subsets of \(\Omega\), is an algebra if
1) \(\mathcal A\) is closed under complements: if \( A \in \) \(\mathcal A\) \(\implies\) \(A^c \in \) \(\mathcal A;\)
2) \(\mathcal a\) is closed under finite unions: if \(A_n \in \) \(\mathcal A\) \(\forall i \in \{1,...,n\} \) \(\implies\) \(\cup_i A \in \mathcal A. \)
Let \(\mathcal C\) be a collection of subsets of \(\Omega\). We call the smallest \(\sigma\)-algebra that contains or is generated by \(\mathcal C\) as \(\sigma(\mathcal C) = \bigcap _{\mathcal \alpha \in \mathcal A } \mathcal F_{\alpha},\) where each \(\mathcal F_{\alpha}\) is a \(\sigma\)-algebra that contains \(\mathcal C\).
Examples of Algebras and \(\sigma\)-algebras
Define \(2^\Omega\) as the set of all subsets of \(\Omega\). Since \(\forall A \subset \Omega \implies \) \(A \in 2^\Omega\), \(A^c \in \Omega\) since \(A \cup A^c = \Omega\) and thus \(A^c \in \Omega\). Let \(A_i \subset \Omega\ \forall i \in \mathbb N \), then each \(A_i \in 2^\Omega \). Therefore, \(\cup_{i \in \mathbb N}A_i \in 2^\Omega \) since \(\cup_{i \in \mathbb N}A_i \subset \Omega \). Thus, \(2^\Omega\) is a \(\sigma\)-algebra of subsets of \(\Omega\).
I claim that based on the definitions of a \(\sigma\)-algebra, for any set on \(\Omega\), we can always find a generating \(\sigma\)-algebra, \(\mathcal A\), such that \(\mathcal A \ne \emptyset \). Since \(2^\Omega\ \) is a \(\sigma\)-algebra such that \(2^\Omega\) is the set of all subsets of \(\Omega\), every \(\sigma\)-algebra of subsets of \(\Omega \) is contained in \(2^\Omega\). Therefore, we can always find a \(\sigma\)-algebra for an arbitrary set \(\Omega \). \(_\square\)
Dynkin Systems, the \(\pi\)-\(\lambda\) Theorem and Extension Theorems
As it turns out, there are certain set structures that are a fair bit weaker than that of the \(\sigma\)-algebra but they are considerably useful when attempting to generate \(\sigma\)-algebras and prove particularly tricky results in probability.
We call a collection of subsets \(\mathcal M \) of \(\Omega\) a monotone class if the following hold:
1) \(\mathcal M \) is closed under increasing unions: if \(A_{n} \subset A_{n+1} \forall n ,\) then \(\bigcup_{i \in \mathbb N} A_{i} \in \mathcal M.\)
2) \(\mathcal M \) is closed under decreasing intersections: if \(A_{n+1} \subset A_{n} \forall n, \) then \(\bigcap_{i \in \mathbb N} A_{i} \in \mathcal M.\)
We call a collection \(\mathcal P \) of subsets of \(\Omega\) a \( \pi\)-system if \(\mathcal P \) is closed under finite intersection, i.e.
1) \(\forall A,B \in \)\(\mathcal P \) \(\implies A \cap B \in \mathcal P.\)
A collection \(\mathcal L\) is called a \( \lambda\)-system if
1) \(\Omega \in \mathcal L\)
2) \(A \in \mathcal L \implies A^c \in \mathcal L\)
3) \(A_i \in \mathcal L A_i \subset A_{i+1} \forall A_i \implies \bigcup_{n \in \mathbb N} A_{n} \in \mathcal L.\)
Let \(\mathcal L\) be a \( \lambda\)-system, then by the definition of a \( \lambda\)-system we can derive the following property:
2-1) If \(A, B \in \mathcal L \) such that \(A \subset B\), then \(B-A \in \mathcal L.\)
Let \(\mathcal L\) be a \( \lambda\)-system. Let \(A, B \in \lambda \) such that \(A \subset B\). By property 2 and 3 of \( \lambda\)-systems, since \(B \in \mathcal L \implies B^c \in \mathcal L\). Thus, \(A \cup B^c \in \mathcal L \) since \(A \cap B^c = \emptyset\). Thus, by property 2 again, we see that \(A \cup B^c \in \mathcal L \implies (A \cup B^c)^c \in \mathcal L \). By DeMorgan's law, \((A \cup B^c)^c = A^c \cap B = B-A \in \mathcal L\).
Now, assume 1 and 2-1. Since \(\Omega \in \mathcal L\) and \(A \subset \Omega \implies \Omega-A \in \mathcal L\) and \(\Omega - A = \Omega \cap A^c = A^c\). So \(A^c \in \mathcal L\), which gives property 2 of \(\mathcal L\). \(_\square\)
Lemma: If \(\mathcal L\) is a \(\pi\)-system and a \(\lambda\)-system, then \(\mathcal L\) is a \(\sigma\)-algebra.
Let \(\mathcal L\) be a \(\pi\)-system and a \(\lambda\)-system. Then, \(\Omega \in \mathcal L \) and \(A \in \mathcal L \implies A^c \in \mathcal L \). So, we need only to show that \(\mathcal L\) is closed under countable unions. Let \(A_n \in \mathcal L\), where \(A_n\)'s are not necessarily disjoint. Hence, we can form a partition of the \(A_n\)'s as follows: Let \(B_1 = A_1\), \(B_2 = A_2-A_1\), \(B_3 = A_3 - (A_1 \cup A_2) \), ... , \(B_n = A_n -(A_1 \cup A_2 \cup ... \cup A_{n-1})\). Now we have a disjoint partition of the \(A_n\)'s and since \(\mathcal L\) is also a \(\pi\)-system, \(\mathcal L\) is closed under intersection, so \(\cup_n B_n = \cup_n A_n \in \mathcal L\) and therefore \(\mathcal L\) is a \(\sigma\)-algebra. \(_\square\)
We now will state and prove a useful theorem in the construction of probability measures, called the \(\pi \)-\(\lambda\) theorem.
Let \(\mathcal P \) be a \(\pi\)-system and \(\mathcal L\) a \(\lambda\)-system containing \(\mathcal P \). Then \(\sigma(\mathcal P) \subset \mathcal L.\)
Let \(\mathcal L\) be the smallest \(\lambda\)-system containing \(\mathcal P \), then \(\mathcal L \) is a \(\lambda\)-system as the intersection of all classes of the same type preserves the properties of that class. Let \(\sigma(\mathcal P)\) be the smallest \(\sigma\)-algebra containing \(\mathcal P\), then \(\sigma(\mathcal P)\) is closed under all properties of a \(\lambda \)-system, so \(\sigma(\mathcal P)\) is also a \(\lambda\)-system that contains \(\mathcal P\). Thus, \(\mathcal L \subset \sigma(\mathcal P)\), since \(\sigma(\mathcal P)\) is one such collection of the intersection of all \(\lambda \)-systems that contain \(\mathcal P\). Now we must show that \(\sigma(\mathcal P) \subset \mathcal L \). We do so by defining the following collection:
Let \(\mathcal L_A := \{B \subset \Omega : B \cap A \in \mathcal L \} \), that is, if \(A \in \mathcal L \), then \(\mathcal L_A\) is a \(\lambda\)-system. Let \(A \in \mathcal L \), and since \(\Omega \cap A = A \in \mathcal L\), we have \(\Omega \in \mathcal L_A\). Let \(B, C \in \mathcal L_A\), such that \(B \subset C \). Given \(A \in \mathcal L\), we have \( (C-B) \cap A = (C \cap A) - (B \cap A)\). Since \((B \cap A) \subset (C \cap A)\), we have that \((C-B) \cap A \in \mathcal L\), and thus \(C-B \in \mathcal L_A\), so 2-1 holds (which implies that 2 holds). Let \(B_i \in \mathcal L_A \forall i \in \mathbb N\), such that the \(B_i \)'s are disjoint. Since \(A \in \mathcal L\), then each \(B_i \cap A \) is disjoint and since \(\cup_i B_i \in \mathcal L\), we have that \(\cup_i(A \cap B_i) \in \mathcal L\), and thus \(\cup_i B_i \in \mathcal L_A\) and \(\mathcal L_A\) is a \(\lambda\)-system.
Now, if we assume that \(A \in \mathcal P\), \(B \in \mathcal P\), since \(\mathcal P\) is a \(\pi\)-system, then \(A \cap B \in \mathcal P\). Since \(\mathcal P \subset \mathcal L \implies A \cap B \in \mathcal L \implies B \in \mathcal L_A\), and thus \(\mathcal P \subset \mathcal L_A\). Thus, \(A \in \mathcal P \) and \(B \in \mathcal L \implies B \in \mathcal L_A \implies B \in \mathcal L_B \). By using the results from the definition on \(\mathcal L_A\), we see that \(B \in \mathcal L \implies \mathcal P \subset \mathcal L_B\). Again, by the construction of \(\mathcal L_A\), we see that \(\mathcal L_B \) is a \(\lambda\)-system and \(B \in \mathcal L \implies \mathcal L \subset \mathcal L_B\). Take \(B \in \mathcal L\) and \(C \in \mathcal L\) to arrive at either \(C \in \mathcal L \) or \(B \cap C \in \mathcal L\) and by symmetry of the construction, it must be the case that \(A \cap B \in \mathcal L\) and \(\mathcal L\) is both a \(\pi\)-system and a \(\lambda\)-system. Using the previous lemma, we have that \(\mathcal L\) is a \(\sigma\)-algebra that contains \(\mathcal P\). Thus, \(\sigma(\mathcal P) \subset \mathcal L\) and we see that \(\sigma(\mathcal P) = \mathcal L\). \(_\square\)
We now provide a construction that is used to generate probabilities that are defined on an algebra, \(\mathcal A\).
We call \(\mathbb P^*\) an outer probability measure if \( \forall A \in \Omega \), \(\mathbb P^*(A) = \inf \big\{\sum_{n \in \mathbb N} \mathbb P(B_n): B_n \in \mathcal A, A \subset \bigcup_{n \in \mathbb N} B_n \big\}.\)