Taylor's Theorem (with Lagrange Remainder)
The Taylor series of a function is extremely useful in all sorts of applications and, at the same time, it is fundamental in pure mathematics, specifically in (complex) function theory. Recall that, if \( f(x) \) is infinitely differentiable at \(x=a\), the Taylor series of \(f(x)\) at \(x=a\) is by definition
\[\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!} (x-a)^n = f(a) + f'(a) (x-a) + \frac{f''(a)}{2}(x-a)^2 + \frac{f'''(a)}{3!}(x-a)^3 + \cdots.\]
The expression (and its tremendous utility) comes from its being the best polynomial approximation (up to any given degree) near the point \(x=a\). For \( f(x)=\sin x\) and \( a=0 \), it's easy to compute all the \( f^{(n)}(0) \) and to see that the Taylor series converges for all \( x\in\mathbb R \) (by ratio test), but it's by no means obvious that it should converge to \( \sin x \). After all, the derivatives at \( x=0 \) only depend on the values of the function very close to \(x=0\). Why should you expect that somehow it "knows" the values of the function far away?
That the Taylor series does converge to the function itself must be a non-trivial fact. Most calculus textbooks would invoke a Taylor's theorem (with Lagrange remainder), and would probably mention that it is a generalization of the mean value theorem. The proof of Taylor's theorem in its full generality may be short but is not very illuminating. Fortunately, a very natural derivation based only on the fundamental theorem of calculus (and a little bit of multi-variable perspective) is all one would need for most functions.
Derivation from FTC
We start with the fundamental theorem of calculus (FTC) in what should be its most natural form:
\[ f(x) = f(a) + \int_a^x {\color{red}f'(x_1)}\, dx_1.\]
The expression naturally requires that \( f \) be differentiable \((\)i.e. \( f' \) exist\()\) and \( f' \) is continuous between \( a \) and \( x \) — we shall say \(f\) is continuously differentiable for short \((\)or \(f\in C^1).\) You could allow \(f'\) to have some jump discontinuities, but we'll soon see that more differentiability will come up, not less. For the sake of definiteness, imagine that \( x \) is bigger than \( a \) and \(x_1\) is a variable running from \(a\) to \(x\).
If, furthermore, \( f' \) is continuously differentiable \((\)we say that \(f\) is twice continuously differentiable, or \(f\in C^2),\) we can apply the FTC to \(f'\) on the inverval \([a, x_1]\):
\[ {\color{red} f'(x_1)} = {\color{red}f'(a) + \int_a^{x_1} {\color{green}f''(x_2)}\, dx_2}.\]
Putting this into the expression for \( f(x) \), we have
\[ \begin{align*} f(x) &= f(a) + \int_a^x \left( {\color{red} f'(a) + \int_a^{x_1} {\color{green}f''(x_2)}\, dx_2} \right) dx_1 \\ &= f(a) + f'(a) (x-a) + \int_a^x \int_a^{x_1} {\color{green}f''(x_2)}\,dx_2\, dx_1. \end{align*}\]
Playing this game again, if \(f''\) is continuously differentiable \((\)i.e. \(f\in C^3),\) we could write
\[ {\color{green} f''(x_2)} = {\color{green} f''(a) + \int_a^{x_2} {\color{orange}f'''(x_3)}\, dx_3},\]
so now
\[ \begin{align*} f(x) &= f(a) + f'(a) (x-a) + \int_a^x \int_a^{x_1} \left( {\color{green}f''(a) + \int_a^{x_2} {\color{orange}f'''(x_3)}\, dx_3 }\right) \,dx_2\, dx_1 \\ &= f(a) + f'(a) (x-a) + f''(a)\frac{(x-a)^2}{2} + \int_a^x \int_a^{x_1} \int_a^{x_2} {\color{orange}f'''(x_3)}\, dx_3\, dx_2\, dx_1. \end{align*}\]
This clearly generalizes as follows:
If \(f(x)\) is \( n+1 \) times continuously differentiable \((f\in C^{n+1})\) on an interval containing \(a\), then
\[ f(x) = \sum_{k=0}^n \frac{f^{(k)}(a)}{k!} (x-a)^k + R_{n+1}(x), \]
where
\[ R_{n+1}(x) = \int_a^x \int_a^{x_1} \ldots \int_a^{x_n} f^{(n+1)}(x_{n+1})\,dx_{n+1}\ldots dx_2\, dx_1 ,\]
known as the remainder.
In some sense, we have pushed as much information about the value of \(f(x)\) to the point \(a\) as possible, and what remains is a single "complicated-looking" term.
Verify it for \(f(x)=\sin x\), \(a=0\), and \(n=3\).
We have
\[ \begin{align} R_4(x) &= \int_0^x \int_0^{x_1}\int_0^{x_2}\int_0^{x_3} f^{(4)}(x_4)\,dx_4\, dx_3\, dx_2\, dx_1 \\ &= \int_0^x \int_0^{x_1}\int_0^{x_2}\int_0^{x_3} \sin x_4\,dx_4\, dx_3\, dx_2\, dx_1 \\ &= \int_0^x \int_0^{x_1}\int_0^{x_2} (1-\cos x_3) \, dx_3\, dx_2\, dx_1 \\ &= \int_0^x \int_0^{x_1} \left(x_2 - \sin x_2\right)\, dx_2\, dx_1 \\ &= \int_0^x \left(\frac{x_1^2}{2} - (1-\cos x_1) \right)\, dx_1 \\ &= \frac{x^3}{3!} - x + \sin x.\ _\square \end{align} \]
Note that if there is a bound for \(f^{(n+1)}\) over the interval \((a,x)\), we can easily deduce the so-called Lagrange's error bound which suffices for most applications (such as the convergence of Taylor series; see below). The actual Lagrange (or other) remainder appears to be a "deeper" result that could be dispensed with.
(Lagrange's Error Bound)
If, in addition, \(f^{(n+1)}\) is bounded by \(M\) over the interval \((a,x)\), i.e. \(\big|f^{(n+1)}(\xi)\big|\leq M\) for all \(\xi\in(a,x)\), then
\[\big|R_{n+1}(x)\big| \leq \frac{M}{(n+1)!} |x-a|^{n+1}.\]
The Remainder
The remainder \(R_{n+1}(x) \) as given above is an iterated integral, or a multiple integral, that one would encounter in multi-variable calculus. This may have contributed to the fact that Taylor's theorem is rarely taught this way.
For \(n=1\), the remainder
\[R_2(x) =\int_a^x \int_a^{x_1} f''(x_2)\,dx_2\,dx_1 \]
is a "double integral," where the integrand in general could depend on both variables \(x_1\) and \(x_2\). In our case, the integrand only depends on \(x_2\), so it would be easier if we could integrate over the \(x_1\) variable first. Indeed we could do so (with a little help of Fubini's theorem):
\[\begin{align*} R_2(x) &= \int_a^x {\color{red}\int_{x_2}^x} f''(x_2)\, {\color{red}dx_1}\, dx_2\\ &= \int_a^x f''(x_2) {\color{red}(x-x_2)}\, dx_2. \end{align*} \]
Note that the limits of integration were changed to keep the relative positions of the two variables, namely \(a\leq x_2\leq x_1\leq x\). In fact, the integral should be regarded as over a right-angled triangle in the \(x_1 x_2\)-plane, and it computes the (signed) volume under the surface \(F(x_1, x_2)=f''(x_2)\). This makes it intuitively clear that interchanging the order of integration ought not affect the final result.
For the general case of \(R_{n+1}(x)\), the region of integration is an \((n+1)\)-dimensional "simplex" defined by \(a\leq x_{n+1}\leq x_n\leq \cdots \leq x_1\leq x\), and performing the integration over \(x_1, \ldots , x_n\) \((\)with \(x_{n+1}\) fixed\()\) yields the volume of a right-angled "\(n\)-simplex". To wit,
\[ \begin{align*} R_{n+1}(x) &= \int_a^x \int_a^{x_1} \cdots \int_a^{x_n} f^{(n+1)}(x_{n+1})\,dx_{n+1} \ldots dx_2\, dx_1 \\ &= \int_a^x {\color{red} \int_{x_{n+1}}^{x} \cdots \int_{x_2}^{x} } f^{(n+1)}(x_{n+1})\, {\color{red} dx_1 \ldots dx_n } \, dx_{n+1} \\ &= \int_a^x f^{(n+1)}(x_{n+1}){\color{red}\frac{(x-x_{n+1})^n}{n!}}\,dx_{n+1}, \end{align*}\]
and this is known as the integral form of the remainder.
Under the same condition,
\[ R_{n+1}(x) = \int_a^x f^{(n+1)}(\xi) \frac{(x-\xi)^n}{n!}\,d\xi. \]
By the "real" mean value theorem, this integral can be replaced by the "mean value," attained at some point \(\xi\in (a,x)\), multiplied by the length \(x-a\). Thus we obtain the remainder in the form of Cauchy:
(Cauchy)
\[ R_{n+1}(x) = \frac{f^{(n+1)}({\color{red}\xi})}{n!} (x-{\color{red}\xi})^n (x-a) \quad \text{for some }\ {\color{red}\xi}\in(a, x). \]
Finally, to obtain the form of Lagrange, we simply need to look at the original \((n+1)\)-fold integral, and apply the multi-variable version of the "real" mean value theorem: a multiple integral over a bounded, connected region is equal to its "mean value," attained at some point in the domain by continuity of the integrand, multiplied by the "volume" of the region of integration. (One can prove this by a simple application of extreme value theorem and intermediate value theorem.) Thus we have
(Lagrange)
\[ R_{n+1}(x) = \frac{f^{(n+1)}({\color{red}\xi})}{(n+1)!} (x-a)^{n+1} \quad \text{for some }\ {\color{red}\xi}\in(a, x). \]
Note that it almost certainly is a different \(\xi\) from the one in the Cauchy remainder, and in both cases we can't know where exactly \(\xi\) is without more information on the function \(f(x)\). The Lagrange remainder is easy to remember since it is the same expression as the next term in the Taylor series, except that \(f^{(n+1)}\) is being evaluated at the point \(\xi\) instead of at \(a\).
One could also obtain other forms of the remainder by integrating some but not all of the \(x_1,\ldots, x_n\) variables, and apply the mean value theorem to the remaining variables. With a bit careful analysis, one has
For \(p=0, 1, \ldots, n\),
\[ R_{n+1}(x) = \frac{f^{(n+1)}({\color{red}\xi})}{p!(n+1-p)!} (x-{\color{red}\xi})^{p} (x-a)^{n+1-p} \quad \text{for some } {\color{red}\xi}\in (a, x). \]
This is very close to, but not quite the same as, the Roche-Schlömilch form of the remainder.
It should also be mentioned that the integral form is typically derived by successive applications of integration by parts, which avoids ever mentioning multiple integrals. However, it may be considered the same proof (up to homotopy, in some sense) because integration by parts, in essence, is saying that one could compute a certain area either by integrating over the \(x\) variable or over the \(y\) variable.
Convergence of Taylor Series
In addition to giving an error estimate for approximating a function by the first few terms of the Taylor series, Taylor's theorem (with Lagrange remainder) provides the crucial ingredient to prove that the full Taylor series converges exactly to the function it's supposed to represent. A few examples are in order.
\(f(x)=\sin x\) is infinitely differentiable \((f\in C^\infty),\) and all the derivatives \(f^{(n)}(x)\) are one of four possibilities, namely \(\pm\cos x\) and \(\pm\sin x\). Therefore, in any of the forms of \(R_{n+1}\) above, we can simply bound \(\big|f^{(n+1)}(\xi)\big|\) by \(1\), so that (using the Lagrange form, say)
\[\big|R_{n+1}(x)\big| \leq \frac{|x-a|^{n+1}}{(n+1)!} \to 0 \quad \text{as }\ n\to\infty\]
for any fixed \(a\) and \(x\in\mathbb R\). Therefore, the Taylor series of \(\sin x\), centered at any point \(a\in\mathbb R\), indeed converges to \(\sin x\) for all \(x\in\mathbb R\).
Now it would be natural to apply this kind of argument to as many functions as possible, and preferably to have some general theorem describing which functions, with a simple criterion or test, enjoy the property that its Taylor series always converges to the right function wherever it converges. That would be a theorem more deserving the name of Taylor's theorem (in the sense of the theorem concerning Taylor series, not to attribute it to Brook Taylor). Unfortunately, the natural criterion of being \(C^\infty\) throughout an interval is not enough. The famous (counter)example is
\[\phi(x)=\begin{cases} e^{-1/x} & x>0 \\ 0 & x\leq 0 \end{cases}\]
for which all the derivatives at \(x=0\) exist and are equal to \(0\), so its Taylor series centered at \(a=0\), or any \(a<0,\) does not converge to \(\phi(x)\) for \(x>0\). The existence of this seemingly-innocent function is highly significant, as it gives rise to a rich reservoir of smooth functions on \(\mathbb R^n\) that can have any desired support (often called bump functions in the study of smooth manifolds, and test functions in the theory of distributions).
But, as far as Taylor series are concerned, these smooth functions are bad. The "good" functions, characterized by the very property that its Taylor series always converge to itself, are called (real) analytic, sometimes denoted \(f\in C^\omega\) to suggest that it is stronger than being \(C^\infty\). The best way to "explain" why the \(\phi(x)\) above is not analytic is by going into the complex domain: even though the two sides "stitch" together smoothly (at the origin) along the real axis, it is not possible to extend it into the complex plane as a (single) holomorphic function, not even for a small neighborhood of 0. In fact, the behavior of \(e^{-1/x}\) for small complex \(x\) is extremely wild (see Picard's theorem on essential singularity). The following theorem, rarely mentioned in calculus as it is considered "outside the scope" of a real-variable course, provides the natural criterion for analyticity that bypasses Taylor's theorem and the difficulty with estimating the remainder.
If \(f(x)\) is a (real- or complex-valued) function on an open interval \(I\subseteq\mathbb R\), and it extends to (i.e. agrees with) a holomorphic function \(f(z)\) on a complex domain \(U\) \((\)an open connected subset of \(\mathbb C)\) containing \(I\), then the Taylor series of \(f(x)\) at any point \(a\in I\) converges to \(f(x)\) wherever it converges \((\)i.e., \(f\) is analytic\().\) Furthermore, the radius of convergence is the largest \(r>0\) such that \(f(x)\) admits a holomorphic extension over a domain containing the open disk \( \{z\in\mathbb C: |z-a|<r\} \).
The vast majority of functions that one encounters — including all elementary functions and their antiderivatives, and more generally solutions to (reasonable) ordinary differential equations — satisfy this criterion, and thus are analytic. For more about analytic functions on the complex domain, see the wiki Analytic Continuation.
Relaxing the Condition
The condition in Taylor's theorem (with Lagrange remainder) can be relaxed a little bit, so that \( f^{(n+1)}\) is no longer assumed to be continuous (and the derivation above breaks down) but merely exists on the open interval \( (a, x) \). The same happens to the mean value theorem, which originally must refer to the fact that
\[ \int_a^b f'(x)\,dx = f'(c) (b-a) \quad \text{for some }\ c\in (a, b) \]
under the condition that \(f'\) is continuous, but is (slightly) generalized so that \(f'\) is no longer assumed to be continuous but merely exist, and the integral on the left-hand side is replaced by \(f(b)-f(a)\). One might wonder, for good reasons, whether such functions exist. Alas, they do. The classic example is
\[ f(x)=\begin{cases} x^2\sin\dfrac{1}{x} & x\neq 0 \\ 0 & x=0 \end{cases} \]
for which \(f'\) exists at \(x=0\) but is not continuous there. The discontinuity is so bad that it's not (Riemann) integrable.
The stronger mean value theorem found an entirely different proof — ultimately relying on properties of the real numbers — and in fact is an essential ingredient in the proof of the fundamental theorem of calculus itself. The stronger version of Taylor's theorem (with Lagrange remainder), as found in most books, is proved directly from the mean value theorem. That this is not the best approach for pedagogy is well argued in Thomas Tucker's Rethinking Rigor in Calculus: The Role of the Mean Value Theorem. For a more illuminating exposition, see Timothy Gowers' blog post.
It should also be noted that the condition in the integral form of the remainder can likewise be relaxed, so that \(f^{(n+1)}\) is no longer assumed to be continuous, but that \(f^{(n)}\) be absolutely continuous, which implies that \(f^{(n+1)}\) exists almost everywhere and is (Lebesgue) integrable \(\big(f^{(n+1)}\in L^1\big)\). In some sense, this is the most general setting for the fundamental theorem of calculus and for integration by parts.