A regular language is a language that can be expressed with a regular expression or a deterministic or non-deterministic finite automata or state machine. A language is a set of strings which are made up of characters from a specified alphabet, or set of symbols. Regular languages are a subset of the set of all strings. Regular languages are used in parsing and designing programming languages and are one of the first concepts taught in computability courses. These are useful for helping computer scientists to recognize patterns in data and group certain computational problems together — once they do that, they can take similar approaches to solve the problems grouped together. Regular languages are a key topic in computability theory.
Regular languages and finite automata can model computational problems that require a very small amount of memory. For example, a finite automaton can generate a regular language to describe if a light switch is on or off, but it cannot keep track of how many times the light was switched on or off.
The machine above can know what state it is currently in, on or off, but it doesn't know what the history of flips was to get it to where it is (i.e. it doesn't know how many times the switch has been flipped on or off).
Finite automata generate or model regular languages. This means that regular languages can be described by a simple state machine diagram. A finite state machine, describes a given language, . is said to accept a string if the machine starts in a start state, undergoes some series of state transitions, and ends up in an accepting state. We say that the machine recognizes the language if accepts all strings that are in . A language is a regular language if there is a finite automaton that recognizes it.
For example, this machine recognizes the language of strings that have an even number of zeroes since any string that has an even number of zeroes will go from the start state to an accepting state.
A regular language can be represented by a string of symbols and operations.
Concatenation is an operation that combines two symbols, sets, or languages. There are two main ways to write a concatenation. and both mean “X concatenated with Y.” Languages can also be concatenated: means strings from are written first, and then strings from come after.
Formally, concatenation is defined as follows: .
Union is an operation where a finite state machine can make one choice or another at any step (this can also be thought of as an “or” operation). For example, there could be a finite state machine that can choose between concatenating with or concatenating with . Or is written with a vertical bar, and the expression from the previous sentence looks like this: .
Formally, union is described as follows: . Note, here the vertical bar means “such that.”
The Kleene star is an operation denoted by an asterisk that basically represents repeated self-concatenation. For example, if a language represents all strings of the form then this language includes , , , , etc. The Kleene star can repeat the symbol it operates on any number of times.
The regular language represents strings of the form “X concatenated with Y or Z repeated any number of times.”
Formally, the Kleene star is defined as follows: . The Kleene star is a unary operation since it operates only on one thing instead of two (unlike union and concatenation, which are binary operations).
Because can be , can be repeated times. This means that the empty string is always a member of .
The Empty String
Another important symbol is which represents the empty string.
Languages are often written as follows: where the vertical bar means "such that." The pattern often resembles something like this where the exponent actually represents a unary operation — the would be repeated times. For example, is written out as .
Finite automata and regular expressions are different ways to represent regular languages.
Finite automata can be used to generate strings in a regular language. A finite automaton for a particular language is “programmed,” in a way, to generate the strings of a given language through its states and transition functions. You can walk through a finite state machine to see what strings are able to be made and therefore are part of the language the machine described, or you can feed it an input string to see if a given input can be made by the machine.
Note: The symbol epsilon, , represents transitions on the empty string, sometimes called a "null transition." This means that the machine can take these transitions without needing to read a particular symbol on the input.
A regular expression is one way to represent a regular language as a string. For example, the regular language described by the regular expression means strings that either contain any number of 0’s followed by a single 1 or any number of 1’s followed by a single 0. This regular expression can be represented by the following finite state machine:
The regular expression , which is the language of strings starting with any number of “01” substrings, followed by two 1’s and then any number of “10” substrings, can be represented with the following finite state machine:
What is the regular expression that describes this finite state machine?
The start state is and the accepting state is .
From we can generate a 1 by looping back in on , and we can do this however many times; we can generate a 0 by looping back in on , and we can do this however many times; or we can generate a 0 by transitioning to . Once we are in , we can generate any number of 0's by looping back in on .
Therefore, the regular expression that describes this is .
The set of regular languages over a given alphabet, is defined recursively as follows:
The empty language and empty string are regular languages.
For every the singleton language is a regular language.
The closure rules described in the next section apply to regular languages.
No other languages over are regular.
To prove if a language is a regular language, one can simply provide the finite state machine that generates it. If the finite state machine for a given language is not obvious (and this might certainly be the case if a language is, in fact, non-regular), the pumping lemma for regular languages is a useful tool. To see nonregular languages, check out the kinds of languages that are recognized by context-free grammars or Turing machines — both of these recognize regular languages and more.
The Pumping Lemma For Regular Languages:
The idea behind the pumping lemma for regular languages is that there are certain constraints a language must adhere to in order to be a regular language. You can use the pumping lemma to test if all of these constraints hold for a particular language, and if they do not, you can prove with contradiction that the language is not regular.
The pumping lemma for regular languages states that if a language is regular, there exists some integer pumping length such that every string with a length of or more symbols, , that can be written where are substrings of such that
All regular languages are “pumpable,” meaning that the pumping lemma for regular languages constraints hold true for all regular languages. If a language is not pumpable, then it is not a regular language. However, if a language is pumpable, it is not necessarily a regular language.
Essentially, the pumping lemma holds that arbitrarily long strings can be pumped, or have segments repeated, without ever producing a new string that is not in the language .
To prove that a language is not regular, use proof by contradiction and the pumping lemma. Set up a proof that claims that is regular, and show that a contradiction of the pumping lemma’s constraints occurs in at least one of the three constraints listed above.
Use the pumping lemma to prove that is not a regular language.
Assume, for the sake of contradiction, that is a regular language. By the pumping lemma, there exists an integer pumping length for . We need a string that is longer than or equal to the length of . Certainly is longer than , so we choose this for the string. This is in since it has ’s and ’s.
Now by the pumping lemma, . There are three possible places in the string that we can assign to be :
- for some . This means that is contained purely in the ’s section.
- for some . This means that is contained purely in the ’s section.
- for some and where This means that the segment is contained somewhere in the ’s and ’s section.
In any of these three cases, we can easily verify that the third constraint for the pumping lemma, , does not hold. In other words, for any of these three choices of , the string cannot be pumped in a way that results in a string that has an equal number of ’s and ’s the definition of the language
Let’s take a short example string described by and .
In the first case, there will be more ’s than there are ’s, making the resulting, pumped string not a member of . If we pump this region, we will get the string a string with ’s and ’s. Clearly, this is not in the language. A similar proof can be checked for the second case; just pump the region instead of the region.
For the third case, we do something similar. We just need to find one example of pumping in this region that violates the third property of the pumping lemma since if the language is regular, it needs to be pumpable for all . If we pump and and pump the last in the section and the first two ’s in the section, we get this string: — a string with six ’s and seven ’s, which is clearly not in the language of strings with an equal number of ’s and ’s.
Regular languages have the following closure properties. A set is closed under an operation if doing the operation on a given set always produces a member of the same set. This means that if one of these closed operations is applied to a regular language, the result will also be a regular language.
- Union: Regular languages are closed under the union operation. This means that if and are both regular languages, then is also a regular language.
Since and are both regular languages, by definition there is some finite state machine that describes each of them; call these and .
If we make a machine from and , the machine will accept a given input string if either or accepts . Since simulates and will accept if either or accepts, which means it will accept the union of and .
Here is what such a machine would look like:
- Intersection: Regular languages are closed under the intersection operation. This means that if and are both regular languages, then will also be a regular language.
Just like in the union closure proof, construct a machine that is made up of and which accept and , respectively. runs both the and machines in parallel and accepts if both and accepts.
- Concatenation: Regular languages are closed under the concatenation operation. This means that if and are both regular languages, then will also be a regular language.
This proof is similar to the proof for closure under union. First, take which accepts and which accepts . How can we hook up these two machines into a machine that accepts
In union, we connected the finite state machines in series, and with an intersection, we connected the machines in parallel. Since the concatenation of and means we put strings from before strings from , we have to combine the machines in such a way that all/any of the strings are produced before we get to the machine that will produce strings from . We can accomplish this by hooking up and like this using a nondeterministic finite automaton:
- Kleene star: It is easy to see that regular languages are closed under the star operation since concatenation is closed and the star is just repeated self-concatenation.
- Complement: Regular languages are closed under complement. This means that if is a regular language, then is also a regular language.
Take the state diagram for and change all of the accepting states to rejecting states and all of the accepting states to rejecting states. We have just made a new finite state machine that accepts ’s complement. Since a finite state machine accepts , it is a regular language.
- , A. Buchi non deterministic example.svg. Retrieved June 19, 2016, from https://en.wikipedia.org/wiki/File:Buchi_non_deterministic_example.svg
- , . Regular Language. Retrieved June 21, 2016, from https://en.wikipedia.org/wiki/Regular_language
- Aaronson, S. 6.045J Automata, Computability, and Complexity: Lecture 4, Spring 2011. Retrieved June,21, 2016, from http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-045j-automata-computability-and-complexity-spring-2011
- Aaronson, S. 6.045J Automata, Computability, and Complexity: Lecture 4, Spring 2011. Retrieved June, 21, 2016, from http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-045j-automata-computability-and-complexity-spring-2011
- Fitch, W., & Friederici , A. Artificial grammar learning meets formal language theory: an overview. Retrieved June 19, 2016, from http://rstb.royalsocietypublishing.org/content/367/1598/1933