Context Free Grammars

Context-free grammars (CFGs) are used to describe context-free languages. A context-free grammar is a set of recursive rules used to generate patterns of strings. A context-free grammar can describe all regular languages and more, but they cannot describe all possible languages.

Context-free grammars are studied in fields of theoretical computer science, compiler design, and linguistics. CFG’s are used to describe programming languages and parser programs in compilers can be generated automatically from context-free grammars.

Two parse trees that describe CFGs that generate the string "x + y * z". Source: Context-free grammar wikipedia page.

Context Free Grammars

Context-free grammars can generate context-free languages. They do this by taking a set of variables which are defined recursively, in terms of one another, by a set of production rules. Context-free grammars are named as such because any of the production rules in the grammar can be applied regardless of context—it does not depend on any other symbols that may or may not be around a given symbol that is having a rule applied to it.

Context-free grammars have the following components:

A set of terminal symbols which are the characters that appear in the language/strings generated by the grammar. Terminal symbols never appear on the left-hand side of the production rule and are always on the right-hand side.

A set of nonterminal symbols (or variables) which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols. These are the symbols that will always appear on the left-hand side of the production rules, though they can be included on the right-hand side. The strings that a CFG produces will contain only symbols from the set of nonterminal symbols.

A set of production rules which are the rules for replacing nonterminal symbols. Production rules have the following form: variable \(\rightarrow\) string of variables and terminals.

A start symbol which is a special nonterminal symbol that appears in the initial string generated by the grammar.

For comparison, a context-sensitive grammar can have production rules where both the left-hand and right-hand sides may be surrounded by a context of terminal and nonterminal symbols.

To create a string from a context-free grammar, follow these steps:^[1]

Begin the string with a start symbol.

Apply one of the production rules to the start symbol on the left-hand side by replacing the start symbol with the right-hand side of the production.

Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right-hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols. Note, it could be that not all production rules are used.

Formal Definition

A context-free grammar can be described by a four-element tuple \((V, \Sigma, R, S),\) where

\(V\) is a finite set of variables (which are non-terminal);
\(\Sigma\) is a finite set \((\)disjoint from \(V)\) of terminal symbols;
\(R\) is a set of production rules where each production rule maps a variable to a string \(s \in (V \cup \Sigma)*\);
\(S\) \((\)which is in \(V)\) which is a start symbol.

Come up with a grammar that will generate the context-free (and also regular) language that contains all strings with matched parentheses.

There are many grammars that can do this task. This solution is one way to do it, but should give you a good idea of if your (possibly different) solution works too.

Starting symbol \(\rightarrow\) S
Non-terminal variables = \(\{(,)\}\)
Production rules:

S \(\rightarrow\) ( )

S \(\rightarrow\) SS

S \(\rightarrow\) (S). \(_\square\)

A way to condense production rules is as follows:

We can take

S \(\rightarrow\) ( )
S \(\rightarrow\) SS
S \(\rightarrow\) (S)

and translate them into a single line: S \(\rightarrow\) ( ) | SS | (S) | \(\epsilon,\) where \(\epsilon \) is an empty string.

Here is a context-free grammar that generates arithmetic expressions (subtraction, addition, division, and multiplication)^[1].

Start symbol = <expression>

Terminal symbols = \(\{+, -, *, /, (, ), \text{number}\},\) where “number” is any number

Production rules:

<expression> \(\rightarrow\) number

<expression> \(\rightarrow\) (<expression>)

<expression> \(\rightarrow\) <expression> + <expression>

<expression> \(\rightarrow\) <expression> - <expression>

<expression> \(\rightarrow\) <expression> * <expression>

<expression> \(\rightarrow\) <expression> / <expression>

This allows us to construct whatever expressions using multiplication, addition, division, and subtraction we want. What these production rules tell us is that the result of any operation, for example, multiplication, is also an expression (denoted, <expression>).

Using the example above, show the steps of deriving the following expression: \((4 + 5) * (2 - 6)\). Note, there are many ways to do this, but the solution below should give you enough guidance to check if your derivation works.

<expression> \(\rightarrow\) 4 (using rule 1)
<expression> \(\rightarrow\) 5 (using rule 1)
<expression> \(\rightarrow\) 4 + 5 (using rule 3)
<expression> \(\rightarrow\) (4 + 5) (using rule 2)
<expression> \(\rightarrow\) 2 (using rule 1)
<expression> \(\rightarrow\) 6 (using rule 1)
<expression> \(\rightarrow\) 2 - 6 (using rule 4)
<expression> \(\rightarrow\) (2 - 6) (using rule 2)
<expression> \(\rightarrow\) (4 + 5) * (2 - 6) (using rule 5)

Context-free grammars can be modeled as parse trees. The nodes of the tree represent the symbols and the edges represent the use of production rules. The leaves of the tree are the end result (terminal symbols) that make up the string the grammar is generating with that particular sequence of symbols and production rules.

The parse trees below represent two ways to generate the string "a + a - a" with the grammar

\[A \rightarrow A + A | A − A | a.\]

Example of an ambiguous grammar—one that can have multiple ways of generating the same string^[2]

Because this grammar can be implemented with multiple parse trees to get the same resulting string, this is said to be ambiguous.

Relationship with other Computation Models

A context-free grammar can be generated by pushdown automata just as regular languages can be generated by finite state machines. Since all regular languages can be generated by CFGs, all regular languages can too be generated by pushdown automata.

Any language that can be generated using regular expressions can be generated by a context-free grammar.

The way to do this is to take the regular language, determine its finite state machine and write production rules that follow the transition functions.

References

Nelson , R. Context-Free Grammars. Retrieved June 11, 2016, from https://www.cs.rochester.edu/~nelson/courses/csc_173/grammars/cfg.html
, J. Leftmostderivations. Retrieved June 14, 2016, from https://en.wikipedia.org/wiki/File:Leftmostderivations_jaredwf.png

Contents

Context Free Grammars

Formal Definition

Relationship with other Computation Models

See Also

References