# Taylor's Theorem (with Lagrange Remainder)

The **Taylor series** of a function is extremely useful in all sorts of applications, and at the same time fundamental to the whole *theory of functions*. Recall that, if \( f(x) \) is infinitely differentiable at \(x=a\), the Taylor series of \(f(x)\) at \(x=a\) is by definition

\[\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!} (x-a)^n = f(a) + f'(a) (x-a) + \frac{f''(a)}{2}(x-a)^2 + \frac{f'''(a)}{3!}(x-a)^3 + \cdots.\]

For \( f(x)=\sin x\) around \( a=0 \), it's easy to compute all the \( f^{(n)}(0) \) and see that the Taylor series converges for all \( x\in\mathbb R \) (by ratio test), but it's by no means obvious that it should converge to \( \sin x \). After all, the derivatives at \( x=0 \) only depend on the values of \( \sin x \) close to \( x=0. \) Why should you expect that somehow it "knows" the values of the function far away?

That the Taylor series converges to the function itself must be a nontrivial fact. Most calculus textbooks would invoke a so-called **Taylor's theorem (with Lagrange remainder)**, and would probably mention that it is a generalization of the **mean value theorem**. The proof of Taylor's theorem in its full generality may be short but is not very illuminating. Fortunately a very natural derivation based only on the **fundamental theorem of calculus** (and a little bit of multi-variable thinking) is all one would need for most functions. In fact, the Taylor series itself falls out of this derivation, along with the various "forms" of the remainder.

## Derivation from FTC

We start with the fundamental theorem of calculus (FTC) in what should be its most natural form:

\[ f(x) = f(a) + \int_a^x {\color{red}f'(x_1)}\, dx_1.\]

The expression naturally requires that \( f \) is differentiable \((\)i.e. \( f' \) exists\()\) and \( f' \) is continuous between \( a \) and \( x \)—we shall say \(f\) is *continuously differentiable* for short \((\)or \(f\in C^1).\) You could allow \(f'\) to have some jump discontinuities, but we'll soon see that more differentiability will come up, not less. For the sake of definiteness, imagine that \( x \) is bigger than \( a \) and \(x_1\) is a variable running from \(a\) to \(x\).

If, furthermore, \( f' \) is continuously differentiable \((\)we say \(f\) is *twice continuously differentiable*, or \(f\in C^2),\) we can apply the FTC to \(f'\) on the inverval \([a, x_1]\):

\[ {\color{red} f'(x_1)} = {\color{red}f'(a) + \int_a^{x_1} {\color{green}f''(x_2)}\, dx_2}.\]

Putting this into the expression for \( f(x) \), we have

\[ \begin{align*} f(x) &= f(a) + \int_a^x \left( {\color{red} f'(a) + \int_a^{x_1} {\color{green}f''(x_2)}\, dx_2} \right) dx_1 \\ &= f(a) + f'(a) (x-a) + \int_a^x \int_a^{x_1} {\color{green}f''(x_2)}\,dx_2\, dx_1. \end{align*}\]

Playing this game again, if \(f''\) is continuously differentiable \((\)i.e. \(f\in C^3),\) we could write

\[ {\color{green} f''(x_2)} = {\color{green} f''(a) + \int_a^{x_2} {\color{orange}f'''(x_3)}\, dx_3},\]

so now

\[ \begin{align*} f(x) &= f(a) + f'(a) (x-a) + \int_a^x \int_a^{x_1} \left( {\color{green}f''(a) + \int_a^{x_2} {\color{orange}f'''(x_3)}\, dx_3 }\right) \,dx_2\, dx_1 \\ &= f(a) + f'(a) (x-a) + f''(a)\frac{(x-a)^2}{2} + \int_a^x \int_a^{x_1} \int_a^{x_2} {\color{orange}f'''(x_3)}\, dx_3\, dx_2\, dx_1. \end{align*}\]

This clearly generalizes as follows:

If \(f(x)\) is \( n+1 \) times continuously differentiable \((f\in C^{n+1})\) on an interval containing \(a\), then \[ f(x) = \sum_{k=0}^n \frac{f^{(k)}(a)}{k!} (x-a)^k + R_n(x), \] where \[ R_n(x) = \int_a^x \int_a^{x_1} \ldots \int_a^{x_n} f^{(n+1)}(x_{n+1})\,dx_{n+1}\ldots dx_2\, dx_1 ,\] known as the

remainder.

In a way, it pushes as much information about the value of \(f(x)\) to the point \(a\), and what remains is a single complicated-looking term.

Verify it for \(f(x)=\sin x\), \(a=0\), and \(n=3\).

We have

\[ \begin{align} R_3(x) &= \int_0^x \int_0^{x_1}\int_0^{x_2}\int_0^{x_3} f^{(4)}(x_4)\,dx_4\, dx_3\, dx_2\, dx_1 \\ &= \int_0^x \int_0^{x_1}\int_0^{x_2}\int_0^{x_3} \sin x_4\,dx_4\, dx_3\, dx_2\, dx_1 \\ &= \int_0^x \int_0^{x_1}\int_0^{x_2} (1-\cos x_3) \, dx_3\, dx_2\, dx_1 \\ &= \int_0^x \int_0^{x_1} \left(x_2 - \sin x_2\right)\, dx_2\, dx_1 \\ &= \int_0^x \left(\frac{x_1^2}{2} - (1-\cos x_1) \right)\, dx_1 \\ &= \frac{x^3}{3!} - x + \sin x.\ _\square \end{align} \]

## The Remainder

The remainder \(R_n(x) \) as given above is an **iterated integral**, or a multiple integral that one would encounter in multi-variable calculus. This may have contributed to the fact that Taylor's theorem is rarely taught this way.

For \(n=1\), the remainder \[R_1(x) =\int_a^x \int_a^{x_1} f''(x_2)\,dx_2\,dx_1 \] is a double integral, which in general could have the integrand depend on both of the variables \(x_1\) and \(x_2\). In our case the integrand only depends on \(x_2\), so it would be easier if we could integrate over the \(x_1\) variable first. Indeed we could do so (with a little help of Fubini's theorem): \[\begin{align*} R_1(x) &= \int_a^x {\color{red}\int_{x_2}^x} f''(x_2)\, {\color{red}dx_1}\, dx_2\\ &= \int_a^x f''(x_2) {\color{red}(x-x_2)}\, dx_2. \end{align*} \] Note that the limits of integration were changed in accordance with the relative positions of the two variables, namely \(a\leq x_2\leq x_1\leq x\). In fact, the integral should be regarded as over a right-angled triangle in the \(x_1 x_2\)-plane, and it computes the (signed) volume under the surface \(F(x_1, x_2)=f''(x_2)\). This makes it clear that interchanging the order of integration is justified.

For the general case of \(R_n(x)\), the region of integration is an \((n+1)\)-dimensional "simplex" defined by \(a\leq x_{n+1}\leq x_n\leq \cdots \leq x_1\leq x\), and performing the integration over \(x_1, \ldots , x_n\) (with \(x_{n+1}\) fixed) yields the volume of a right-angled "\(n\)-simplex". To wit,
\[ \begin{align*} R_n(x) &= \int_a^x \int_a^{x_1} \cdots \int_a^{x_n} f^{(n+1)}(x_{n+1})\,dx_{n+1} \ldots dx_2\, dx_1 \\
&= \int_a^x {\color{red} \int_{x_{n+1}}^{x} \cdots \int_{x_2}^{x} } f^{(n+1)}(x_{n+1})\, {\color{red} dx_1 \ldots dx_n } \, dx_{n+1} \\
&= \int_a^x f^{(n+1)}(x_{n+1}){\color{red}\frac{(x-x_{n+1})^n}{n!}}\,dx_{n+1}. \end{align*}\]
and this is known as the **integral form** of the remainder.

Under the same condition, \[ R_n(x) = \int_a^x f^{(n+1)}(\xi) \frac{(x-\xi)^n}{n!}\,d\xi. \]

By the "real" mean value theorem, this integral can be replaced by the "mean value," attained at some point \(\xi\in (a,x)\), multiplied by the length \(x-a\). Thus we obtain the remainder in the **form of Cauchy**:

(Cauchy) \[ R_n(x) = \frac{f^{(n+1)}({\color{red}\xi})}{n!} (x-{\color{red}\xi})^n (x-a) \quad \text{for some }\ {\color{red}\xi}\in(a, x). \]

Finally, to obtain the **form of Lagrange**, we simply need to look at the original \((n+1)\)-fold integral, and apply the multi-variable version of the "real" mean value theorem: a multiple integral over a bounded, *connected* region is equal to its "mean value," attained at some point in the domain by continuity of the integrand, multiplied by the "volume" of the region of integration. (One can prove this by a simple application of extreme value theorem and intermediate value theorem.) Thus we have

(Lagrange) \[ R_n(x) = \frac{f^{(n+1)}({\color{red}\xi})}{(n+1)!} (x-a)^{n+1} \quad \text{for some }\ {\color{red}\xi}\in(a, x). \]

Note that it almost certainly is a different \(\xi\) from the one in the Cauchy remainder. The Lagrange remainder is easy to remember since it is the same expression as the next term in the Taylor series, except that \(f^{(n+1)}\) is being evaluated at a point \(\xi\) instead of at \(a\).

One could also obtain other forms of the remainder by integrating some but not all of the \(x_1,\ldots, x_n\) variables, and apply the mean value theorem to the other variables.

It should also be mentioned that the integral form is typically derived by successive applications of integration by parts, which avoids ever mentioning multiple integrals. However, it may be considered the same proof (up to homotopy, in some sense) because integration by parts, in essence, is saying that one could compute a certain area either by integrating over the \(x\) variable or over the \(y\) variable.

## Convergence of Taylor Series

When \(f(x)\) is infinitely differentiable \((f\in C^\infty),\) we have the full Taylor series with a remainder of \(R_n(x)\) for each \(n\), *regardless* of whether the Taylor series converges at all. Therefore, to prove a particular Taylor series converges to the right function amounts to giving a bound for \(|R_n(x)|\) so that it tends to \(0\) as \(n\to\infty\), with \(x\) fixed.

\(f(x)=\sin x\) is infinitely differentiable, and all the derivatives \(f^{(n)}(x)\) are one of four possibilities, namely \(\pm\cos x\) and \(\pm\sin x\). Therefore, in any of the forms of the remainder above \((\)with any \(a\in\mathbb R),\) we could bound \(\big|f^{(n+1)}(\xi)\big|\) by \(1\), so that (using the Lagrange form, say) \[\big|R_n(x)\big| \leq \frac{|x-a|^{n+1}}{(n+1)!} \to 0 \quad \text{as }\ n\to\infty\] for any fixed \(x\in\mathbb R\). Therefore, the Taylor series of \(\sin x\), centered at any point \(a\in\mathbb R\), indeed converges to \(\sin x\) for all \(x\in\mathbb R\).

Now it would be natural to apply this argument to as many functions as possible, and to have some general theorem describing which functions, possibly with a simple criterion or test, enjoy the property that the Taylor series, wherever it converges, converges to the right function; that would be a theorem more deserving the name of Taylor's theorem. Unfortunately, just being \(C^\infty\) *throughout* an interval is not enough. The famous (counter)example is
\[\phi(x)=\begin{cases} e^{-1/x} & x>0 \\ 0 & x\leq 0 \end{cases}\]
for which all the derivatives at \(x=0\) vanish, so the Taylor series \((\)at \(a=0\), or any \(a<0)\) does not converge to \(\phi(x)\) for \(x>0\). The existence of this particular function is highly significant, as it gives rise to a rich reservoir of *smooth* functions on \(\mathbb R^n\) that can have any desired support (often called *bump functions* in the study of smooth manifolds, and *test functions* in the theory of generalized functions or distributions).

But, as far as Taylor series are concerned, these smooth functions are bad. The "good" functions, characterized by the very property that its Taylor series always converge to itself, are called **(real) analytic**, sometimes denoted \(f\in C^\omega\) to suggest that it is stronger than being \(C^\infty\). The best way to "explain" why the \(\phi(x)\) above is not analytic is by going into the complex domain: even though the two sides "stitch" together smoothly along the real axis, it is not possible to extend it (to a holomorphic function) into the complex plane, not even for a small neighborhood of 0. In fact, the behavior of \(e^{-1/x}\) for small complex \(x\) is extremely wild (see Picard's theorem on essential singularity). The following theorem, rarely mentioned in calculus as it is considered "outside the scope" of a real-variable course, gives *the* natural criterion for analyticity that bypasses Taylor's theorem and the difficulty with estimating the remainder.

If \(f(x)\) is a (real- or complex-valued) function on an open interval \(I\subseteq\mathbb R\), and it extends to (i.e. agrees with) a holomorphic function on a

complexneighborhood \(U\subseteq \mathbb C\) of \(I\), then the Taylor series of \(f(x)\) at any point \(a\in I\) converges to \(f(x)\) within its radius of convergence. In fact, the radius of convergence is the largest \(r>0\) such that \(f(x)\) admits a holomorphic extension that includes the open disk \( \{z\in\mathbb C: |z-a|<r\} \) in its domain.

The vast majority of functions that one encounters—including all elementary functions and their antiderivatives, and more generally solutions to (reasonable) ordinary differential equations—satisfy this criterion, and thus are analytic. For more about analytic functions on the complex domain, see the wiki Analytic Continuation.

## Relaxing the Condition

The condition in Taylor's theorem (with Lagrange remainder) can be relaxed a little bit, so that \( f^{(n+1)}\) is no longer assumed to be continuous (and the derivation above breaks down), but merely exists on the open interval \( (a, x) \). This is akin to the mean value theorem, which originally must refer to the fact that \[ \int_a^b f'(x)\,dx = f'(c) (b-a) \quad \text{for some }\ c\in (a, b) \] under the condition that \(f'\) is continuous, but is (slightly) generalized so that \(f'\) is no longer assumed to be continuous, but merely exist, and the integral on the left hand side is replaced by \(f(b)-f(a)\). One might wonder, for good reasons, whether such functions exist. Alas, they do. The classic example is \[ f(x)=\begin{cases} x^2\sin\dfrac{1}{x} & x\neq 0 \\ 0 & x=0 \end{cases} \] for which \(f'\) exists at \(x=0\), but is not continuous there. The discontinuity is so bad that it's not (Riemann) integrable.

The stronger mean value theorem found an entirely different proof—ultimately relying on properties of the real numbers—and in fact is an essential ingredient in the proof of the fundamental theorem of calculus itself. The stronger version of Taylor's theorem (with Lagrange remainder), as found in most books, is proved directly from the mean value theorem. For a more illuminating exposition, see Timothy Gowers' blog post.

It should also be noted that the condition in the integral form of the remainder can likewise be relaxed, so that \(f^{(n+1)}\) is no longer assumed to be continuous, but that \(f^{(n)}\) be *absolutely continuous*, which implies that \(f^{(n+1)}\) exists almost everywhere, and is (Lebesgue) integrable \(\big(f^{(n+1)}\in L^1\big)\). In some sense this is the most general setting for the fundamental theorem of calculus, and for integration by parts.

**Cite as:**Taylor's Theorem (with Lagrange Remainder).

*Brilliant.org*. Retrieved from https://brilliant.org/wiki/taylors-theorem-with-lagrange-remainder/