Context Free Languages

Context-free languages (CFLs) are generated by context-free grammars. The set of all context-free languages is identical to the set of languages accepted by pushdown automata, and the set of regular languages is a subset of context-free languages. An inputed language is accepted by a computational model if it runs through the model and ends in an accepting final state. All regular languages are context-free languages, but not all context-free languages are regular. Most arithmetic expressions are generated by context-free grammars, and are therefore, context-free languages. Context-free languages and context-free grammars have applications in computer science and linguistics such as natural language processing and computer language design.

^[1]

Context-free Languages

In formal language theory, a language is defined as a set of strings of symbols that may be constrained by specific rules. Similarly, the written English language is made up of groups of letters (words) separated by spaces. A valid (accepted) sentence in the language must follow particular rules, the grammar.

A context-free language is a language generated by a context-free grammar. They are more general (and include) regular languages. The same context-free language might be generated by multiple context-free grammars.

The set of all context-free languages is identical to the set of languages that are accepted by pushdown automata (PDA).

Here is an example of a language that is not regular (proof here) but is context-free:

\(\{a^nb^n | n \geq 0\}\). This is the language of all strings that have an equal number of a’s and b’s.

In this notation, \(a^4b^4\) can be expanded out to aaaabbbb, where there are four a’s and then four b’s. (So this isn’t exponentiation, though the notation is similar).

Closure Properties

Context-free languages have the following closure properties. A set is closed under an operation if doing the operation on a given set always produces a member of the same set. This means that if one of these closed operations is applied to a context-free language the result will also be a context-free language.

Union: Context-free languages are closed under the union operation. This means that if \(L \text{ and } P\) are both context-free languages, then \(L \cup P\) is also a context-free language.

Here is a proof that context-free grammars are closed under union. ^[2]

Let \(L\) and \(P\) be generated by the context-free grammars, \(G_L = (V_L, \Sigma_L, R_L, S_L)\) and \(G_P = (V_P, \Sigma_P, R_P, S_P)\), respectively.

Without loss of generality, subscript each nonterminal symbol in \(G_L\) with an \(L\), and each nonterminal of \(G_P\) with a \(P\) such that \(V_L \cap V_P = \emptyset\).

Define the CFG, \(G\), that generates \(L \cup P\) as follows: \(G = (V_L \cup V_P \cup \{S\}, \Sigma_L \cup \Sigma_P, R_L \cup R_P \cup \{S -> S_L | S_P\}, S)\).

Concatenation: If \(L \text{ and } P\) are both context-free languages, then \(LP\) is also context free. The concatenation of a string is defined as follows: \(S_1S_2 = vw: v \in S_1 \text{ and } w \in S_2\).

Here is a proof that context-free grammars are closed under concatenation. This proof is similar to the union closure proof.

Let \(L\) and \(P\) be generated by the context-free grammars, \(G_L = (V_L, \Sigma_L, R_L, S_L)\) and \(G_P = (V_P, \Sigma_P, R_P, S_P)\), respectively.

Without loss of generality, subscript each nonterminal symbol in \(G_L\) with an \(L\), and each nonterminal of \(G_P\) with a \(P\) such that \(V_L \cap V_P = \emptyset\).

Define the CFG, \(G\), that generates \(L \cup P\) as follows: \(G = (V_L \cup V_P \cup \{S\}, \Sigma_L \cup \Sigma_P, R_L \cup R_P \cup \{S -> S_LS_P\}, S)\).

Every word that \(G\) generates is a word in \(L\) followed by a word in \(P\), which is the definition of concatenation.

Kleene Star: If \(L\) is a context-free language, then \(L*\) is also context free. The Kleene star can repeat the string or symbol it is attached to any number of times (including zero times). The Kleene star basically performs a recursive concatenation of a string with itself. For example, \( \{a,b\}* = \{\epsilon, a, b, ab, aab, aaab, abb \cdots \}\) and so on. We've already proved that CFLs are closed under concatenation.

Context-free languages are not closed under complement or intersection.

If CFL's were closed under intersection then there would be CFLs that violate the pumping lemma for context-free languages (see the next section for more details) which cannot be.

Take two context-free languages \(L = \{a^nb^nc^m\}\) and \(P = \{a^nb^mc^m\}\). The intersection of \(L\) and \(P\), \(L \cap P = \{a^nb^nc^n\}\), which we will see below in the pumping lemma for context-free languages, is not a context-free language.

To show that CFL's are not closed under complement, we will use contradiction, the result from intersection, and De Morgan's Law.

Assume, for contradiction, that \(\overline{L}\) is a context-free language if \(L\) is a context-free language.

We know that CFLs are closed under union and De Morgan's law states:

\(\overline{A} \cup \overline{B} = \overline{A \cap B}\)

Which can be rewritten:

\(\overline{\overline{A} \cup \overline{B}} = \overline{\overline{A \cap B}}\)

Which is:

\(\overline{\overline{A} \cup \overline{B}} = A \cap B\)

Since using the assumption that \(\overline{L}\) and \(L\) are CFLs resulted in a false statement, we know that CFLs are not closed under complement.

The Pumping Lemma for Context-Free Languages

Proving that something is not a context-free language requires either finding a context-free grammar to describe the language or using another proof technique (though the pumping lemma is the most commonly used one). A common lemma to use to prove that a language is not context-free is the Pumping Lemma for Context-Free Languages.

The pumping lemma for context-free languages states that if a language \(L\) is context-free, there exists some integer pumping length \(p \geq 1\) such that every string \(s \in L\) has a length of \(p\) or more symbols, \(|s| \geq p\), that can be written \(s = uvwxy\) where \(u, v, w, x, \text{ and } y\) are substrings of \(s\) such that:

\(|vwx| \leq p\)

\(|vx| \geq 1\)

\(uv^nwx^ny \in L \forall n \geq 0 \)

All context-free languages are “pumpable” meaning that the pumping lemma constraints hold true for all context-free languages. If a language is not pumpable, then it is not a context-free language. However, if a language is pumpable, it is not necessarily a context-free language. Because the set of regular languages is contained in the set of context-free languages, all regular languages must be pumpable too.

Essentially, the pumping lemma holds that arbitrarily long strings \(s \in L\) can be pumped without ever producing a new string that is not in the language \(L\).

To prove that a language is not context-free, use proof by contradiction and the pumping lemma. Set up a proof that claims that \(L\) is context-free, and show that a contradiction of the pumping lemma’s constraints occurs in at least one of the three constraints listed above.

Bascially, the idea behind the pumping lemma for context-free languages is that there are certain constraints a language must adhere to in order to be a context-free language. You can use the pumping lemma to test if all of these contraints hold for a particular language, and if they do not, you can prove with contradiction that the language is not context-free.

Use the Pumping Lemma to prove that \(L = \{a^nb^nc^n | n > 0\}\) is not a context-free language.^[3]

Assume, for the sake of contradiction, that \(L = \{a^nb^nc^n | n > 0\}\) is a context-free language. By the pumping lemma, there exists an integer pumping length \(p\) for \(L\). We need a string \(s\) that is longer than or equal to the length of \(p\). Certainly \(s = a^pb^pc^p\) is longer than \(p\), so we choose this for the \(s\) string. This \(s\) is in \(L\) since it has \(p\) a’s, \(p\) b’s, and \(p\) c’s.

Now by the pumping lemma, \(|vwx| \leq p\). There are five possible places in the string that we can assign to be \(vwx\):

\(vwx = a^j\) for some \(j \leq p\). This means that \(vwx\) is contained purely in the a’s section.

\(vwx = a^jb^k\) for some \(j\) and \(k\) where \(j+k \leq p\).This means that the \(vwx\) segment is contained somewhere in the a’s and b’s section.

\(vwx = b^j\) for some \(j \leq p\). This means that \(vwx\) is contained purely in the b’s section.

\(vwx = b^jc^k\) for some \(j\) and \(k\) where \(j+k \leq p\).This means that the \(vwx\) segment is contained somewhere in the b’s and c’s section.

\(vwx = c^j\) for some \(j \leq p\). This means that \(vwx\) is contained purely in the c’s section.

In any of these five cases, we can easily verify that the third constraint for the pumping lemma, that \(uv^nwx^ny \in L \forall n \geq 0 \), does not hold. In other words, for any of these five choices of \(vwx\), the string \(s\) cannot be pumped in a way that results in a string that has an equal number of a’s, b’s, and c’s (the definition of the language \(L\)).

Let’s take a short example string described by \(a^5b^5c^5 = \text{aaaaabbbbbccccc}\) and \(p = 3\).

In the first case, there will be more a’s than there are b’s and c’s, making the resulting, pumped string, not a member of \(L\). If we pump this region, we will get the string aaaaaaaabbbbbccccc: a string with \(8\) a’s, \(5\) b’s, and \(5\) c’s. Clearly this is not in the language. A similar proof can be checked for the third and fifth case, just pump the b and c region, respectively, and the results will be symmetrical.

For the second and fourth case, we do something similar. If we pump anywhere in the a and b region only, we will have a resulting string with more a’s and b’s than c’s (for the second case) and more b’s and c’s than a’s (in the fifth case). For the second case, if we take \(a^5b^5c^5 = \text{aaaaabbbbbccccc}\) and \(p = 3\) and pump the last a in the a section and the first two b’s in the b section, we get this string: aaaaaabbbbbbbccccc — a string with six a’s, seven b’s, and five c’s. The fifth case has a symmetrical example.

In the example above, why was \(vwx = a^ib^jc^k\) for some \(i\), \(j\), and \(k\) where \(j+k + i\leq p\) not included as one of the options for \(vwx\)?

The first constraint of the pumping lemma is that \(|vwx| \leq p\). Because the string was \(s = a^pb^pc^p\), there are \(p\) a’s, followed by \(p\) b’s, followed by \(p\) c’s. If \(vwx\) spanned the a region, b region, and c region, it would necessarily be longer than \(p\). For example, if \(vwx\) included only the last a in the a region, the entire b region, and then the first element of the c region, it would have length \(p+2\) and therefore wouldn’t work.

Relationship with other Computation Models

Any language that can be generated using regular expressions can be generated by a context-free grammar, but not all context-free languages are regular languages. We also know that regular expressions and languages can be generated by finite state machines (one simple computational model).

Regular languages can “keep track of” one thing, while context-free languages can “keep track of” up to two things. For example, there is a regular language that can generate all strings that have an even number of zeroes, but there is not a regular language that can generate all strings that have an equal number of ones and zeroes — a context-free language can do this, however. But, neither a context-free language or regular language can generate the language for all strings that have an equal number of a’s, b’s, and c’s (look at the Pumping Lemma example in the above sections).

There are grammars called context-sensitive grammars which are more powerful (meaning they can generate more complex languages that might require more memory) than both regular languages and context-free languages.

Context-free languages are described by context-free grammars, which can be generated by pushdown automata. Regular languages and finite state machines can describe some context-free languages, but not all. Turing machines can generate all regular languages, all context-free languages, and more.

References

, J. Chomsky Hierarchy. Retrieved June 13 2016, from https://en.wikipedia.org/wiki/Chomsky_hierarchy
Cappello, P. Chapter 17: Context-Free Languages. Retrieved June 14, 2016, from http://www.cs.ucsb.edu/~cappello/136/lectures/17cfls/slides.pdf
, . Pumping lemma for context-free languages. Retrieved June 13 2016, from https://en.wikipedia.org/wiki/Pumping_lemma_for_context-free_languages

Contents