Lambda Calculus
The Lambda calculus is an abstract mathematical theory of computation, involving $\lambda$ functions. The lambda calculus can be thought of as the theoretical foundation of functional programming. It is a Turing complete language; that is to say, any machine which can compute the lambda calculus can compute everything a Turing machine can (and vice versa).
The $\lambda$ notation is based on function abstraction and application based on variable binding and substitution. If you have done the same basic programming already, you might be familiar with lambda functions or anonymous functions. They are featured in Haskell, Mathematica, Python and recent versions of C++, too. In this wiki, we will mostly stick to Haskell.
Contents
Introduction to the Lambda Notation
Consider the function $f(x) = x^2$ implemented as
1f x = x^2
Another way to write this function is $x \mapsto x^2,$ which in Haskell would be
1(\ x > x^2)
Notice that we're just stating the function without naming it. This is the essence of lambda calculus. In lambda calculus notation, we would write this as $\lambda x.x^2.$ We could apply this function on another expression (a variable, or another function), like $\big(\lambda x.x^2\big) 7,$ which actually refers to 49.
To compute a lambda expression is to transform it into a canonical form using a set of transformation rules/reductions, as we will see. We will see that the $\lambda$ and the $\bf{.}$ (along with a few predefined variables) themselves are powerful enough to express any computable function.
The simplest function abstraction I can think of that can be expressed in the lambda calculus is the identity function: $\lambda x.x.$
Expressions
Expressions can be thought of as programs in the language of lambda calculus.
Given the notion of a variable, usually denoted by $x,y,z,\ldots,$ we define an expression inductively in terms of abstractions (anonymous functions) and applications as follows:
Let $\Lambda$ be the set of expressions.
 Variables: If $\mathscr{x}$ is a variable, then $\mathscr{x} \in \Lambda.$
 Abstractions: If $\mathscr{x}$ is a variable and $\mathscr{M} \in \Lambda$, then $(\lambda \mathscr{x}. \mathscr{M}) \in \Lambda.$
 Applications: If $\mathscr{M} \in \Lambda$ and $\mathscr{N} \in \Lambda$, then $\ (\mathscr{M} \mathscr{N}) \in \Lambda.$
Here are two important conventions:
 The common convention is that function application is left associative, unless stated otherwise by parentheses, i.e. $\mathscr{E}_1 \mathscr{E}_2 \mathscr{E}_3 \equiv \big((\mathscr{E}_1 \mathscr{E}_2) \mathscr{E}_3\big).$
 Consecutive abstractions can be uncurried, i.e. $\lambda \mathscr{xyz}.\mathscr{M} \equiv \lambda\mathscr{x}. \lambda\mathscr{y} . \lambda\mathscr{z} . \mathscr{M}.$
 The body of the abstraction extends to the right as far as possible: $\lambda \mathscr{x}. \mathscr{MN} \equiv \lambda \mathscr{x}. (\mathscr{MN}).$
Free and Bound Variables
In an abstraction like $\lambda x.x,$ the variable $x$ is something that has no original meaning but is a placeholder. We say that $x$ is a variable bound to the $\lambda.$
On the other hand, in $\lambda x.y$, i.e. a function which always returns $y$ whatever it takes, $y$ is a free variable since it has an independent meaning by itself.
Because a variable is bound in some subexpression (the scope) does not mean it is bound everywhere. For example, this is a perfectly valid expression (an example of an application, by the way): $(\lambda x.x) (\lambda y.yx).$ Here the $x$ in the second parentheses has nothing to do with the one in the first.
Let us define these concepts formally:
 $\mathscr{x}$ is free in the expression $\mathscr{x}$.
 $\mathscr{x}$ is free in the expression $\lambda \mathscr{y}.\mathscr{M}$ if $\mathscr{x} \neq \mathscr{y}$ and $\mathscr{x}$ is free in $\mathscr{M}$.
 $\mathscr{x}$ is free in $\mathscr{M}\mathscr{N}$ if $\mathscr{x}$ is free in $\mathscr{M}$ or if it is free in $\mathscr{N}$.
 $\mathscr{x}$ is bound in the expression $\lambda \mathscr{x}.\mathscr{M}$.
 $\mathscr{x}$ is bound in $\mathscr{M}\mathscr{N}$ if $\mathscr{x}$ is bound in $\mathscr{M}$ or if it is bound in $\mathscr{N}$.
Notice that a variable can both be bound and free but they represent different things, as we discussed in the example above.
An expression with no free variables is called a closed expression. (Compare this with how we define propositions in predicate logic.)
Reductions
$\alpha$ Equivalence:
$\alpha$ equivalence states that any bound variable is a placeholder and can be replaced (renamed) with a different variable, provided there are no clashes.
$\lambda x. x$ and $\lambda y. y$ are $\alpha$ equivalent.
However, this is not always that simple. Consider the expression $\lambda x. (\lambda x. x)$. It is $\alpha$ equivalent to $\lambda y. (\lambda x. x)$ but not to $\lambda y. (\lambda x. y)$.
Also, $\alpha$ conversion cannot result in a variable getting captured by a different example. For example, $\lambda x. (\lambda y. x) \neq _{\alpha} \lambda y. (\lambda y. y ).$ $\alpha$ conversion is not something that happens only in the lambda calculus. Remember that $\int^a_b f(x) \, dx = \int^a_b f(t)\, dt$ and $\sum^n_{i=m} f(i) = \sum^n_{j=m} f(j)$. That's the same thing, too.
Let us move forward and formalize this idea.
For any expression $\mathscr{M}$ and any $\mathscr{y}$ such that
 $x=y,$ or
 $\mathscr{x}$ and $\mathscr{y}$ are not bound in $\mathscr{M},$ and $\mathscr{y}$ is not free in $\mathscr{M},$
$\lambda \mathscr{x}. \mathscr{M} = _\alpha \lambda \mathscr{y} . \big( \mathscr{M} \left \{ \mathscr{y} /\mathscr{x} \right \} \big).$
That is to say, all instances of $\mathscr{x}$ will be replaced by $\mathscr{y}$.
If $\mathscr{M} \left \{ \mathscr{y} /\mathscr{x} \right \}$ were to be generalized, one would have to define $\mathscr{M} \left \{ u /\mathscr{x} \right \}$ for all $\lambda$function $u$. But replacing all instances of $\mathscr{x}$ by $u$ wouldn't make any sense, since $\lambda u. \mathscr{M} \left \{ u /\mathscr{x} \right \}$ may not be defined (in case $u$ is not a variable).
Therefore, it can be tempting to replace all occurrences of $\mathscr{x}$ by $u$ except those directly following a $\lambda$
$\Big($i.e. $(\lambda x. \mathscr{M}) \left \{ u /\mathscr{x} \right \} =_\alpha \lambda x. \big(\mathscr{M} \left \{ u /\mathscr{x} \right \}\big)\Big).$
But in this case, if $u = \mathscr{y}$, the fact that $\mathscr{y}$ doesn't occur in $\mathscr{M}$ is not sufficient anymore, as demonstrated in the following example:
$\lambda x.\underbrace{(\lambda x·x)}_{\mathscr{M}} \left \{ \mathscr{y} /\mathscr{x} \right \} \neq_\alpha \lambda x.(\lambda x·y) \\ \lambda x.\underbrace{(\lambda x·y)}_{\mathscr{M}} \left \{ \mathscr{y} /\mathscr{x} \right \} \neq_\alpha \lambda x.(\lambda x·x).$
This is why we have to ensure that
 $x=y,$ or
 $\mathscr{x}$ and $\mathscr{y}$ are not bound in $\mathscr{M}$, and $\mathscr{y}$ is not free in $\mathscr{M}.$
$$
$\beta$ Reduction:
$\beta$ reduction is the central idea of the $\lambda$ calculus. It tells us how simplifications of abstractions work.
Let's apply the identity function $( \lambda x.x)$ to a free variable, say, $y:$ $( \lambda x.x)y.$ The next thing that should happen is that the function should act on the argument, which is what a $\beta$ is: $( \lambda x.x)y =_{\beta} y.$ This could also be written as $(\lambda x.x)y \mathrel{\mathop{\longrightarrow}^{\mathrm{\beta}}} y.$
Let us take another example: $(\lambda x . x) (\lambda y .y ) \mathrel{\mathop{\longrightarrow}^{\mathrm{\beta}}} (\lambda y .y ).$ That's simple, the identity function applied to the identity function is the identity function itself.
An important feature to be mentioned here is that a $\beta$ reduction cannot result in the capture of a free variable by another abstraction.
$\big(\lambda x . \lambda y . (x y)\big)\big(\lambda x . \lambda y . (x y)\big)$
Our first step is to plug in the second expression as $x$ in the first one. But do we do it like this:
$\bigg( \lambda y \Big( \big(\lambda x . \lambda y . (x y)\big) y\Big) \bigg)?$
No! Because we're messing up between the original $y$'s and the $y$'s of the second abstraction. Going farther down the wrong root results in this:
$\lambda y . \lambda y . (y y).$
This is the right way to go:
$\big(\lambda x . \lambda y . (x y)\big)\big(\lambda x . \lambda y . (x y)\big) \mathrel{\mathop{\longrightarrow}^{\mathrm{\beta}}} \lambda y' . \Big( \big(\lambda x . \lambda y . (x y)\big) y'\Big) \mathrel{\mathop{\longrightarrow}^{\mathrm{\beta}}} \lambda y' . \lambda y. (y' y).$
Let us formalize this with the notion of (captureavoiding) substitution of free variables.
The captureavoiding substitution of $\mathscr{N}$ for free occurrences of $\mathscr{x}$ in $\mathscr{M}$$\mathscr{M}[\mathscr{N}/\mathscr{x}]$ in symbolsis defined as follows:
$\begin{aligned} \mathscr{x}[\mathscr{N}/x] &\equiv \mathscr{N} \\\\ \mathscr{y}[\mathscr{N}/x] &\equiv \mathscr{y} \qquad ( \mathscr{x} \neq \mathscr{y} ) \\\\ (\mathscr{M P}) [\mathscr{N}/\mathscr{x}] &\equiv \big(\mathscr{M}[\mathscr{N}/\mathscr{x}]\big)\big(\mathscr{P}[\mathscr{N}/\mathscr{x}]\big) \\\\ (\lambda \mathscr{x} . \mathscr{M})[\mathscr{N}/\mathscr{x}] &\equiv (\lambda \mathscr{x} . \mathscr{M}) \\\\ (\lambda \mathscr{y} . \mathscr{M})[\mathscr{N}/\mathscr{x}] &\equiv \lambda \mathscr{y} . \big(\mathscr{M}[\mathscr{N}/\mathscr{x}]\big) && ( \mathscr{x} \neq \mathscr{y} , \mathscr{y} \text{ is not a free variable in } \mathscr{N} ) \\\\ (\lambda \mathscr{y} . \mathscr{M})[\mathscr{N}/\mathscr{x}] &\equiv \lambda \mathscr{y'} . \Big(\mathscr{M}\left \{ \mathscr{y'}/\mathscr{y} \right \} [\mathscr{N}/\mathscr{x}]\Big). &&( \mathscr{x} \neq \mathscr{y} , \mathscr{y} \text{ is a free variable in } \mathscr{N} , \mathscr{y'} \text{ is a fresh variable}) \end{aligned}$
Please notice that bound variables cannot be substituted, as we discussed before.
$(\lambda \mathscr{x} . \mathscr{M})\mathscr{N} \mathrel{\mathop{\longrightarrow}^{\mathrm{\beta}}} \mathscr{M} [ \mathscr{N} / \mathscr{x} ]$
Two terms that can be reduced to each other in zero or more $\beta$ reductions or its inverse are $\beta$ equivalent.
$$
$\eta$ Reduction:
Principle of Extensionality: Two functions are identical if they do the same thing, i.e. the same mapping.
Consider the function $\lambda y. (x y)$. What happens when we apply this to an arbitrary variable $z$?
By $\beta$ reduction, we'll get $x z$. Isn't that just the same thing as applying $x$ to $z?$ Instead of applying $\lambda y. (x y)$, we might as well have applied $x.$
We could say that $x$ and $\lambda y. (x y)$ are the same function by the principal of extensionality.
That brings us to $\eta$ reduction.
If $\mathscr{x}$ is a variable and does not appear free in $\mathscr{M},$
$\lambda \mathscr{x}. (\mathscr{Mx}) \mathrel{\mathop{\longrightarrow}^{\mathrm{\eta}}} \mathscr{M}.$
haskell (\x > abs x)
is better written as just
1abs
Booleans
Because the $\lambda$ calculus is a programming language, we'll gradually develop the machinery to write out programs and encode data structures in this language.
To begin with, let us define True as choosing the first element and False as choosing the second element:
$\begin{aligned} T &= \lambda x . \lambda y . x \\ F &= \lambda x. \lambda y. y. \end{aligned}$
But what does that mean? Well, that does not matter as long as we can get them to work like Booleans.
IfThenElse
Recall the regular ifthenelse statements used in functional or imperative programming, or the ternary operator in C++. In Haskell, they are of the following form:
1 

This means that we pick this
if condition
is $T,$ and if condition
is $F$, we pick that
. That (no pun intended) brings us to the following implementation of $ifthenelse:$
$ifthenelse = \lambda c . \lambda x . \lambda y . (c x y).$
This works because of the way we defined $T$ and $F$, that is picking the first and the second element, respectively.
Verify that
$\begin{aligned} ifthenelse \, T \, a \, b &= a \\ ifthenelse \, F \, a \, b &= \, b. \end{aligned}$
Not
Now we could define the rest of the logic gates in terms of $ifthenelse$. not
can be defined as if True then False else True
, which can be written as
$not = \lambda x. (x \, F \, T).$
And
and
returns $T$ only when both variables are $T$. Thus, we could say that if the first element is $F$ the result is definitely $F$, else the result is whatever the second variable was. In Haskell,
1 

Or in $\lambda$ calculus
$and = \lambda x. \lambda y. (x y F).$
Or
Similarly, we could pretend
1 

Or in $\lambda$ calculus
$or = \lambda x. \lambda y. (x T y).$
Prove that $\lambda x. \lambda y. \big( x ( y F T) ( y T F) \big)$ represents the exclusive or (
xor
) gate.
Can you implement the other logic gates?
Church Numerals
Any Turing complete language should be able to work with some representation of natural numbers. That is what church numerals do in $\lambda$ calculus. These numerals are inspired by Peano axioms.
We define $\overline{0}$ as $\lambda s . \lambda z . z$, which is the same as $F$. Then we define the rest of the church numerals as
$\begin{aligned} \overline{1} =& \lambda s . \lambda z . s(z) \\ \overline{2} =& \lambda s . \lambda z . s\big(s(z)\big) \\ \overline{3} =& \lambda s . \lambda z . s\Big(s\big(s(z)\big)\Big) \\ &\vdots \end{aligned}$
$\overline{n} = \lambda s . \lambda z . s^n(z)$
Successor
To obtain the successor of $\overline{n}$, we wrap an extra $s$ around the numeral. This is not very difficult:
$S = \lambda n. \lambda s. \lambda z. \big(s( n s z ) \big).$
$S \overline{n} = \overline{n+1}$
We have
$\begin{aligned} S \overline{n} &= \lambda n. \lambda s. \lambda z. \big(s( n s z ) \big) \overline{n} \\ &= \big(\lambda n. \lambda s. \lambda z. s( n s z ) \big) \big(\lambda s . \lambda z . s^n(z)\big) \\ &=_\beta \lambda s'. \lambda z'. s'\Big( \big(\lambda s . \lambda z . s^n(z)\big) s' z'\Big) \\ &=_\beta \lambda s'. \lambda z'. s'\big(s'^n(z')\big) \\ &= \lambda s'. \lambda z'. s'^{n+1}(z') \\ &=_\alpha \lambda s. \lambda z. s^{n+1}(z) \\ &= \overline{n+1}.\ _\square \end{aligned}$
Addition
Defining natural numbers the way we did, $\overline{n} = \lambda s . \lambda z . s^n(z)$ is actually applying a function $s$ to some starting point $z$. That lets us define addition as
$add = \lambda mn . (m \, S \, n ).$
(If the notation seems unfamiliar, review the conventions in Expressions part of the wiki.)
$add \, \overline{m} \, \overline{n} = \overline{m+n}$
We have
$\begin{aligned} add \, \overline{m} \, \overline{n} &= ( \lambda mn . \lambda sz. m \, S \, n ) \\ &= \big( \lambda s . \lambda z . s^m(z) \big) \, S \, \big(\lambda s . \lambda z . s^n(z)\big) \\ &=_\beta \lambda z. S^m (z) \big(\lambda s . \lambda z . s^n(z)\big) \\ &=_\beta S^m\big(\lambda s . \lambda z . s^n(z)\big) \\ &= \lambda sz . s^{m+n}(z) && (\text{by definition of } S) \\ &= \overline{m+n}.\ _\square \end{aligned}$
Multiplication
Using a similar argument, we could define multiplication as follows:
$mult = \lambda mn . m (add \, n) \overline{0}.$
This is simple, we add $\overline{n}$ to $\overline{0}$ $m$ times. Notice that we're using just one argument on $add$ when we say $(add \, n)$. This is called a partial application.
Exponentiation
Can you use the above ideas to implement $exp$ such that
$exp \, \overline{m} \, \overline{n} = \overline{m}^{\overline{n}}?$
Predecessor
The implementation of the predecessor function is involved. In fact, Church thought for a long time that it might not be possible, until his student Kleene found it. In fact, there is a legend that Kleene conceived the idea while visiting his dentist, which is why the trick is called the wisdom tooth trick.
We'll skip the exact implementation until we introduce pairs, but let us assume that this has already been done and let's call it $pred$ such that
$pred \, \overline{n} = \overline{n1}.$
Subtraction
Subtraction can be implemented in the same way as addition or multiplication:
$sub = \lambda m . \lambda n. n \, pred \, m.$
Ordered Pairs
Of course the simplest way to implement the ordered pair $(a,b)$ is $ab$, but that does not allow retrieval of the two values. Instead, we'll implement pairs like this:
$pair = \lambda a b . \lambda z. z a b$
That is, the ordered pair of $(a,b)$ is $(\lambda z. a b)$. This is helpful because we could use $T$ and $F$ to extract the first and the second elements, respectively $\big($called the left projection($\pi_1$) and the right projection$(\pi_2)\big):$
$\begin{aligned} \pi_1 &= \lambda p. pT \\\\ \pi_2 &= \lambda p. pF. \end{aligned}$
Why does this work? It's because of the way we defined $T$ and $F.$
Wisdom Tooth Trick
Consider the following function which transforms $(x,y)$ to $(x+1,x):$
$\Phi = \lambda p . pair \; S(\pi_1 p) \, (\pi_1 p).$
$\begin{aligned} \Phi \big( pair \; \overline{0} \, \overline{0} \big) &= pair \; \overline{1} \, \overline{0}. \\\\ \Phi ^n \big( pair \; \overline{0} \, \overline{0}\big) &= pair \; \overline{n} \, \overline{n1} \ \ \text{when } n > 1. \end{aligned}$
We can easily check that the first statement is true.
Let the second statement be true for all values up to $k$
Then, by hypotheses,
$\Phi ^k \big( pair \; \overline{0} \, \overline{0} \big) = pair \; \overline{k} \, \overline{k1}.$
Operating $\Phi$ on both sides,
$\Phi ^{k+1} \big( pair \; \overline{0} \, \overline{0} \big) = \Phi \big(pair \; \overline{k} \, \overline{k1}\big) = pair \; \overline{k+1} \, \overline{k}.$
Hence, by induction, we have proven our claim. $_\square$
Implementing the predecessor function is now simple, just pick the second element.
$pred = \lambda n. \pi_2 \Big(n \Phi \big(pair \; \overline{0} \overline{0} \big) \Big)$
Note that, according to this implementation, the predecessor of $\overline{0}$ is $\overline{0}$ itself. This will be useful as we'll notice in the next section.
It is indeed possible to encode linked lists, negative numbers or/and rational numbers in the $\lambda$ calculus. However, we do not discuss that in this wiki.
Relational Operators
Is zero?
The $iszero$ function returns $T$ if the argument is $0$ and $F$ if it isn't.
$iszero = \lambda n. n \, F \, not \, F$
$\begin{aligned} iszero \, \overline{n} &= T && \text{if } n = 0 \\ &= F && \text{otherwise}. \end{aligned}$
Recall that the definition of $\overline{0}$ and $F$ we used are $\alpha$ equivalent.
Hence,
$\begin{aligned} iszero \, 0 &= iszero F \\ &= (\lambda n. n \, F \, not \, F) F \\ &=_\beta F \, F \, not \, F \\ &= not \, F \\ &= T. \end{aligned}$
For some $n > 0$,
$\begin{aligned} iszero \, \overline{n} &= (\lambda n. n \, F \, not \, F) \overline{n} \\ &= \overline{n} \, F \, not \, F \\ &= F^n \, (not) \, F \\ &= (\lambda xy.y)^n \, (not) \, F \\ &= (\lambda y. y)^n \, F\\ &= F.\ _\square \end{aligned}$
Less Than
$m$ is less than or equal to $n$ if $(sub \; m \, n)$ is $\overline{0}$. This is because the predecessor of $\overline{0}$ is still $\overline{0}$ as we noted in the previous section.
$leq = \lambda mn. isZero \; (sub \; m \, n)$
Equality
We'll use the property that
$(x \leq y \wedge y \leq x) \to x = y.$
$eq = \lambda mn. \big(and \; (leq \; m \, n) \; (leq \; n \, m)\big)$
How would you implement strict lessthan and greaterthan?
Fixed Points and Recursion
A fixed point $x$ of a function $f$ is a point that maps back to itself, i.e.
$f(x) = x.$
The point into which the cobweb converges is the fixed point of the cosine function. That means, this point satisfies the equation
$\cos x = x.$
All real functions do not necessarily have a fixed point. However, in lambda calculus, everything is a function and also has a fixed point, which we shall constructively prove here.
In $\lambda$ calculus, every expression $F$ has a fixed point.
It turns out that there are functions called fixed point combinators that generate the fixed point for any $F$.
Take for instance Curry's paradoxical combinator or the Ycombinator
$Y = \lambda f. \big(\lambda x . f (x x)\big) \big(\lambda x . f (x x)\big).$
We claim that $(Y F)$ is a fixed point for any $F$, i.e. $F \, (Y \, F) = (Y \, F)$. Here's how:
$\begin{aligned} Y F & = \Big( \lambda f. \big(\lambda x . f (x x)\big) \big(\lambda x . f (x x)\big) \Big) F \\ & =_\beta \big(\lambda x . F (x x)\big) \big(\lambda x . F (x x)\big) &&& (1) \\ & =_\beta F (\Big( \big(\lambda x . F (x x)\big) \big(\lambda x . F (x x)\big)\Big) \\ & = F(YF).\ _\square &&& \big(\text{from } (1)\big) \end{aligned}$
There are also other fixedpoint combinators such as the Turing combinator
$\Theta = \big( \lambda xy . y (x x y) \big) \big( \lambda xy . y (x x y) \big).$
As an exercise, convince yourself that $\Theta$ is indeed a fixed point combinator, by writing out a proof.
We know that in the standard interpretation of natural numbers, the successor function has no fixed point. However, in $\lambda$ calculus, it does. What interpretation can we assign to that?
Notice that applying the $Y$ combinator to $S$ does not yield a church numeral in any finite number of $\beta$ or $\eta$ reductions. Instead, we get something like this:
$S\Big(S\big(S(S\ldots)\big)\Big).$
One could think of it as an undefined value $\bot$ or infinity, in either of which cases a fixed point is understandable. $_\square$
In Haskell, fix
from Control.Monad.Fix
is a fixed point combinator, implemented in the following way:
1 2 

$$
Recursion
Our discussion of the fixed point combinators up to this point has been very abstract. As an intelligible reader, you might have wondered if these expressions serve any more purpose than sheer mathematical beauty. Actually, they do. Since our functions are anonymous, there is no way we can refer to a function itself within its body using its name. To do so, we need to solve functional "equations."
Recursion is a central theme in functional programming. Since we do not have loops, we use recursion to our rescue.
Let us try to implement a function that returns the sum of the first $n$ natural numbers, where $n$ is an argument.
Let's call this function $f$. We'll use recursion to define $f$ first:
 base case: $f \, \overline{0} = \overline{0}$
 recursion formula: $f \, \overline{n} = \overline{n} + \big(f \, \overline{n1}\big)$
In terms of the ideas we've already defined,
$\begin{aligned} f \, n &= ifThenElse \; (isZero \, n) \; \overline{0} \; \Big(add \, n \; \big(f \, (pred \, n) \big) \Big) \\ \Rightarrow f &= \lambda n. ifThenElse \; (isZero \, n) \; \overline{0} \; \Big(add \, n \; \big(f \, (pred \, n) \big) \Big). \end{aligned}$
By an inverse beta reduction on the righthand side, we get
$f = \bigg(\lambda g . \lambda n. ifThenElse \; (isZero \, n) \; \overline{0} \; \Big(add \, n \; \big(g \, (pred \, n) \big) \Big) \bigg) f.$
Does this look familiar? We've established that $f$ is a fixed point of $\bigg(\lambda g . \lambda n. ifThenElse \; (isZero \, n) \; \overline{0} \; \Big(add \, n \; \big(g \, (pred \, n) \big) \Big) \bigg)!$
Hence, we could rewrite the equation as follows:
$f = \Theta \bigg(\lambda g . \lambda n. ifThenElse \; (isZero \, n) \; \overline{0} \; \Big(add \, n \; \big(g \, (pred \, n) \big) \Big) \bigg),$
where $\Theta$ is Turing's fixed point combinator. $_\square$
Here is the Haskell code for the same with some syntactic sugar:
1 2 3Prelude> let fix f = let x = f x in x Prelude> fix (\g n > if (n == 0) then 0 else n + g (n  1) ) 5 15
Can you write a program in $\lambda$ calculus to denote the factorial function?
Notes on Computability
The $\lambda$ calculus is a Turing complete language. This means that a universal Turing machine (UTM) can be effectively simulated using $\lambda$ calculus and that the UTM can evaluate functions described by the $\lambda$ calculus.
A function $f: \mathbb{N} \to \mathbb{N}$ is computable if there is a corresponding lambda function $F$ such that $F \overline{n} = \overline{f(n)}\ \forall n \in \mathbb{N}$. Because the $\lambda$ calculus is Turing equivalent, this is the same definition of computability as that provided in context of Turing machines. According to the ChurchTuring hypothesis, anything that is physically computable at all falls under this definition.
One of the undecidable things about the $\lambda$ calculus is the equivalence of two lambda expressions. This means that there is no algorithm that can always correctly predict if two given lambda expressions can be reduced to one another. This is an application of the Rice theorem.
Another notable undecidable property of lambda expressions closely related to the Halting problem is that there is no computable function that can tell whether a given expression has a normal form.
A normal form of a lambda function is an equivalent form in which any further $\beta$ or $\eta$ reductions are not possible. For example, $\lambda x.x$ is a normal form, but $\Omega = \big(\lambda x. (x x)\big)\big((\lambda x. (x x)\big)$ does not have any normal form, since $\beta$ reduction always results in an expression with the scope of $\beta$ reduction still available.
Typed Lambda Calculus
Throughout this wiki, we have only discussed the untyped $\lambda$ calculus.
Notice that though we define Booleans, integers, functions on them, and so on, there is no formal notion of invalidity of the application of one kind of a function to another. In practical programming, it is a good idea to distinguish between types, so as to effectively reason about the data we are working with. It also makes it difficult for a programmer to write a program in a statically typed language.
A large number of functional programming languages, including Haskell, is built upon ideas from the typed lambda calculus and category theory.
The CurryHowardLambek isomorphism is an interesting mathematical property that relates typed lambda calculus or programming in general to propositional logic. Informally stated, it says that to every corresponding proposition, there is a corresponding type, which if inhabited (i.e. there is some variable of that type) proves that the statement is true and, similarly to every proof of a proposition, there is a corresponding program.