# Distributions

The theory of **distributions** was introduced by Laurent Schwartz in the 1940s--based on the theory of (locally convex) topological vector spaces--to provide a firm foundation for a theory of **generalized functions**, such as the Dirac \(\delta\)-function. The main advantage is that the derivative of a distribution always exists, and is itself a distribution (hence is infinitely differentiable *in the sense of distributions*). It revolutionizes much of analysis that deals with derivatives, most notably with the concept of **fundamental solution** of a linear differential operator which supplants (or subsumes) the classical notions of Green's function and elementary solution. Compared to other "revolutions" in analysis, such as the epsilon-delta argument, properties of real numbers, and Lebesgue's theory of integration, Schwartz's theory of distributions is conceptually rather simple, and has the appeal of the calculus of Newton and Leibniz: one could work with them without worrying too much about the (functional analytic) foundation.

#### Contents

## Motivation

There were many motivations that led to Schwartz's theory, going at least as far back as to the general solution \[u(x,t)= f(x-t) + g(x+t) \] to the wave equation \[ \frac{\partial^2 u}{\partial t^2} - \frac{\partial^2 u}{\partial x^2} = 0 \] which curiously does not require that \(f\) and \(g\) be differentiable, an observation made by Euler. But the Dirac \(\delta\)-function is perhaps the most striking and the best-known example. It is often "defined" by the properties that

- \(\delta(x)=0\) for all \(x\neq 0\), and
- \(\int_{-\infty}^{\infty} \delta(x)\,dx = 1,\) or more generally \[ \int_{-\infty}^{\infty} \delta(x) f(x) dx = f(0) \] so long as \(f(x)\) is continuous at \(x=0\),

which clearly is impossible within the framework of ordinary functions and any sensible notion of integration. At best it should be regarded as the "limit" of a sequence of actual functions with higher and narrower spike at \(x=0\). In fact, mathematicians had used this kind of limiting procedure, often under the name of *approximation to the identity* (*Dirichlet kernel* being one example), in the study of Fourier series.

Physicists had been more liberal in treating \(\delta(x)\) as a bona fide function (of a real variable \(x\)). For instance, it can be translated: \(\delta(x-x_0)\) represents a spike at \(x=x_0,\) and dilated: \[ \delta(ax) = \frac{\delta(x)}{|a|} \quad a\neq 0, \] and most importantly, it makes sense to say that \(\delta(x)\) is the derivative of the (unit) "step function", also called Heaviside function: \[ H(x) = \begin{cases} 1 & x\geq 0 \\ 0 & x<0 \end{cases}\] and to say \( H'(x) = \delta(x) \) means that we can integrate the \(\delta\)-function by the fundamental theorem of calculus: \[\int_a^b \delta(x)\,dx = H(b)-H(a) = \begin{cases} 1 & \text{if }a<0<b \\ 0 & \text{otherwise} \end{cases} \] (if \(a=0\) or \(b=0\) we have to say it's undefined). Moreover, by "applying" the Fourier inversion formula, one obtains the curious formula \[ \int_{-\infty}^{\infty} e^{ix\xi}\,d\xi = 2\pi i \delta(x).\] A thorny issue is multiplication: while it is perfectly fine to multiply \(\delta(x)\) by an ordinary function, so long as that function is continuous at \(x=0\), it does not seem possible to form products such as \(\delta(x)\delta(x)\).

The calculus of \(\delta\)-functions developed is not confined to one dimension. For example, the product \(\delta(x)\delta(y)\delta(z)\) makes perfectly good sense: it represents the mass (or electric charge) density of a single point-mass (or point-charge) at the origin with total mass (or total charge) = 1. We shall denote it by \(\delta(\vec x)\), though physicists like to write \(\delta^3(\vec x)\). By this sort of expressions one can represent any "distribution" of electric charge -- at various points, along a piece of curve, or on a sheet of surface -- as a single "generalized function" \(\rho(\vec x)\) on \(\mathbb R^3\), which enters the equation of electrostatics (Poisson's equation) as the "source term": \[ \Delta\phi = -4\pi \rho.\] where \[\Delta := \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} + \frac{\partial^2}{\partial z^2}\] is the Laplacian (on \(\mathbb R^3\)). The unknown \(\phi\), the (electric) potential, is to be solved with the extra "boundary" condition that \(\phi\to 0\) as \(|\vec x|\to \infty\). For example, a single point-charge \(\rho(\vec x) = q\delta(\vec x)\) ought to "produce" \(\phi(\vec x) = \frac{q}{|\vec x|}\), which is to say \[ \Delta \frac{1}{|\vec x|} = -4\pi\delta(\vec x),\] a classical fact that could have been known to Newton (his "inverse square law"). This identity helps justify the general solution for an arbitrary charge distribution \(\rho(\vec x)\): \[ \phi(x) = \int \frac{\rho(\vec y)}{|\vec x-\vec y|} d\vec y \] (simply let \(\Delta\) act inside the integral), which may also be interpreted as a "superposition" of potentials, one for each "charge" \(\rho (\vec y) d\vec y\) at the point \(\vec y\) (\(d\vec y\) being the infinitesimal volume, or "volume form"). This and many other identities really make the \(\delta\)-function a worthy object of mathematical inquiry. As we shall see, the theory of distributions gives a very natural interpretation to all these identities.

The charge distribution of electrostatics may well be the source of the terminology for Schwartz, for it is even possible to represent a dipole--two equal and opposite point-charges very close to each other--as a distribution, but not as a measure.

## Distributions: Basic Definitions and Examples

Distributions can be defined on any open set \(U\subset \mathbb R^n\) (and thus on any smooth manifold), and the space of all distributions on \(U\) is denoted by \(\mathcal D'(U)\). However, most of the naturally occurring distributions, including those mentioned above, are in fact defined on \(\mathbb R^n\), so we shall state our definitions for \(\mathcal D'(\mathbb R^n)\) even though there is virtually no change for the general case.

There is an abundance of such functions: the intuition is that any "hand-drawn" function can be "smoothed out" to be infinitely differentiable, even though it's not easy to write one down with formulas. Also, no need to be alarmed by the appearance of \(\mathbb C\); you are free to think of real-valued functions for the most part.Let \(\mathcal D(\mathbb R^n)\) be the set (in fact \(\mathbb C\)-vector space) of all infinitely-differentiable functions \(\varphi:\mathbb R^n \to \mathbb C\) with compact support, i.e. there exists a compact (closed and bounded) set \(K\subset\mathbb R^n\) such that \(\varphi(x)=0\) for all \(x\not\in K\). The space \(\mathcal D(\mathbb R^n)\) is endowed with the topology that \(\varphi_k\to\varphi\) (in \(\mathcal D\)), as \(k\to\infty\), if all the \(\varphi_k\) are supported in the same compact set \(K\), and for any multi-index \(\alpha\in\mathbb N^n\), \(\partial^\alpha \varphi_k\) converges to \(\partial^\alpha\varphi\) uniformly.

A

distribution(on \(\mathbb R^n\)) is a continuous linear functional \(u: \mathcal D(\mathbb R^n) \to \mathbb C\), i.e.,

(linearity) \[ u( c_1\varphi_1+c_2\varphi_2) = c_1 u(\varphi_1) + c_2 u(\varphi_2)\] for all \(c_1, c_2\in\mathbb C\) and all \(\varphi_1, \varphi_2\in\mathcal D\); and

(continuity) \[ u(\varphi_k) \to u(\varphi) \quad\text{as}\quad{\varphi_k\to\varphi}\;(\text{in }\mathcal D) \]

The space of all distributions, endowed with the weak topology, is denoted by \(\mathcal D'(\mathbb R^n)\).

As a matter of fact, any linear functional you can think of will turn out to be continuous, so from the practical point of view one may ignore the topology altogether.

The Dirac \(\delta\)-function is a distribution \(\delta\in\mathcal D'(\mathbb R^n)\) defined by \[ \begin{align} \delta: \mathcal D(\mathbb R^n) &\to \mathbb C \\ \varphi &\mapsto \varphi(0) \end{align} \] which one checks is linear (and takes continuity for granted).

For the sake of intuition, we can have a more general "\(\delta\)-function type" distribution: for any hypersurface \(S\subset\mathbb R^n\), such as the sphere \(S^{n-1}\) (or for that matter, a submanifold of any dimension), we have \[ \begin{align} \delta_S: \mathcal D(\mathbb R^n) &\to \mathbb C \\ \varphi &\mapsto \int_{S} \varphi(x) \, d\sigma(x) \end{align} \] where \(d\sigma\) is the "surface element" on \(S\).

A whole class of examples are given by actual functions, which in particular include all continuous functions \(f:\mathbb R^n\to\mathbb C\).

If \(f:\mathbb R^n\to\mathbb C\) is

locally integrable, i.e. for any compact set \(K\), \(\int_K |f(x)|\,dx\) is finite, then \[ \begin{align} T_f: \mathcal D(\mathbb R^n) &\to \mathbb C \\ \varphi &\mapsto \int_{\mathbb R^n} f(x)\varphi(x)\,dx \end{align} \] is a distribution, and by abuse of notation is often denoted simply by \(f\). Note that changing the values of \(f\) on a set of measure zero does not alter the distribution \(T_f\).For \(n=1\), the Heaviside function \[ H(x) := \begin{cases} 1 & x>0 \\ 0 & x<0 \end{cases} \] is locally integrable, and thus defines a distribution by the recipe above. Note we don't need to specify the value of \(H(0)\). For a less trivial example, \[ x_+^s := \begin{cases} x^s & x>0 \\ 0 & x<0 \end{cases}\] defines a distribution if and only if \(s > -1\) (or \(\operatorname{Re} s> -1\) if we allow \(s\) to be complex).

As defined, distributions are nothing like actual functions, for one can not evaluate \(u\in\mathcal D'(\mathbb R^n)\) at a single point \(x\in\mathbb R^n\). Nevertheless, it is meaningful to speak of the values on an open set (small or big) "collectively": we say \(u\) *vanishes on an open set* \(U\subset\mathbb R^n\) if \(u(\varphi)=0\) for all \(\varphi\) supported in \(U\) (with a technical caveat that the support is defined as the *closure* of the set on which \(\varphi\) does not vanish; in other words, \(\varphi\) stays away from the boundary of \(U\)). Similarly, we say \(u\) *agrees with a function* \(f(x)\) on an open set \(U\subset\mathbb R^n\) if \(u(\varphi) = \int f(x)\varphi(x)\,dx\) for all \(\varphi\) supported in \(U\). Incidentally, functions \(\varphi\in\mathcal D(\mathbb R^n)\) are often called *test functions* because one can imagine that they are being used to "test" or "measure" the values of a function (or a distribution) in the vicinity of a point.

It may be remarked that the theory of distributions is in line with the "formalist" philosophy of mathematics: a mathematical object is defined by what it *does* (in relation to other objects), instead of what it *is* intrinsically.

The most important fact of distributions is that it can always be differentiated. The definition is such that it agrees with the usual notion of derivatives on actual functions (sometimes called *regular* functions for emphasis, though it's a terribly overloaded word in mathematics), and in this sense distributions are said to be generalized functions. To "derive" the definition, let \(u=T_f\) for some \(C^1\) function \(f:\mathbb R^n\to\mathbb C\) so its partial derivative \(\partial_i f = \frac{\partial f}{\partial x_i}\), \(i=1, \ldots, n\), is continuous and thus defines a distribution \(T_{\partial_i f}\) (which is what we want \(\partial_i u\) to be):
\[ \varphi \mapsto \int_{\mathbb R^n} \partial_i f(x)\varphi(x)\,dx\]
By integration by part, we have
\[\int_{\mathbb R^n} \partial_i f(x)\varphi(x)\,dx = -\int_{\mathbb R^n}f(x) \partial_i \varphi(x)\,dx\]
where the boundary term drops out because \(\varphi\) is compactly supported. Thus, the distribution \(T_{\partial_i f}\) can also be defined by
\[ \varphi \mapsto -\int_{\mathbb R^n} f(x) \partial_i \varphi(x)\,dx = -T_f(\partial_i\varphi) \]
where the derivative is acting on \(\varphi\) instead. So, we can extend this definition to all \(u\in\mathcal D'(\mathbb R^n)\), namely
\[ \varphi \mapsto -u(\partial_i\varphi) \]
is a distribution, and we shall denote it by \(\partial_i u\). It is often convenient to denote the value of \(u(\varphi)\) by \(\langle u, \varphi\rangle\), often used for the "pairing" of a finite-dimensional vector space and its dual (inspired by Dirac's bra-ket notation). For instance, the \(\delta\)-function is written as \(\langle\delta, \varphi\rangle = \varphi(0)\) for all \(\varphi\in\mathcal D\).

For any distribution \(u\in\mathcal D'(\mathbb R^n)\), its

partial derivative \(\partial_i u\) in the \(i\)-th directionis a distribution defined by \[ \langle \partial_i u, \varphi\rangle = -\langle u, \partial_i \varphi\rangle \quad \text{for all}\quad\varphi\in\mathcal D\] More generally, for any multi-index \(\alpha\), \(\partial^\alpha u\) is a distribution defined by \[ \langle \partial^\alpha u, \varphi\rangle = (-1)^{|\alpha|}\langle u, \partial^\alpha \varphi\rangle \quad \text{for all}\quad\varphi\in\mathcal D\]

The derivatives of the \(\delta\)-function \(\delta\in\mathcal D'(\mathbb R^n\)) are given by \[ \varphi \mapsto -\frac{\partial\varphi}{\partial x_i}(0)\] (and similarly for higher derivatives).

For \(n=1\), we can check the identity \( H' = \delta \)

in the sense of distributions: \[ \langle H', \varphi\rangle = -\langle H, \varphi'\rangle = -\int_0^\infty \varphi'(x)\,dx = - \varphi(x)\bigr|_0^\infty = \varphi(0) = \langle\delta, \varphi\rangle \] for all \(\varphi\in\mathcal D(\mathbb R)\), a confirmation that the theory of distributions conforms with physicist's intuition.

In fact, the procedure of "deriving" the definition works for many other operations, so long as it is well defined on the test functions \(\varphi\in\mathcal D(\mathbb R^n)\) (e.g. translations, Fourier transform, convolution). More precisely, if \(Lf\) is well-defined for (some class of) locally integrable functions \(f\), and \[ \langle Lf, \varphi\rangle = \langle f, L^* \varphi\rangle \quad\text{for all}\quad \varphi\in\mathcal D\] for some operation \(L^*\), we can define \(Lu\) for an arbitrary distribution \(u\in\mathcal D'(\mathbb R^n)\) by the same expression: \[ \langle Lu, \varphi\rangle = \langle u, L^* \varphi\rangle \quad\text{for all}\quad \varphi\in\mathcal D\] For the case of derivatives, \(L=\partial_i\) and the "adjoint" is \(L^*=-\partial_i\). As exercises, come up with your own definition for translating a distribution, and more generally a "change of variables" on \(\mathbb R^n\). More examples shall be discussed later.

Having defined derivatives for distributions, we come immediately to the first major achievement of the theory, the notion of fundamental solution of a linear differential equation, or rather a linear differential operator, on \(\mathbb R^n\): \[ P(x,\partial) = \sum_{\alpha} a_{\alpha}(x)\partial^\alpha \] (the sum is finite, and the coefficients \(a_\alpha(x)\) are in some class of functions on \(\mathbb R^n\)). One particular class is when all the coefficients are constants.

Afundamental solution(or Green's function) of a linear differential operator \(P(\partial)\) with constant coefficients is a distribution \(E\in\mathcal D'(\mathbb R^n)\) such that \[P(\partial) E = \delta\] (in the sense of distributions).

(Ehrenpreis-Malgrange, 1955) Every (non-zero) linear differential operator \(P(\partial)\) with constant coefficients has a fundamental solution.

Remark: It is not unique, since one can always add a homogeneous solution (such as a constant). For certain types of \(P(\partial)\), one can impose extra conditions that will ensure a unique fundamental solution that, in some sense, is the best one.

The (Newtonian) potential \(\frac{1}{|x|}\) defines a distribution on \(\mathbb R^3\), and it is the unique fundamental solution for the Laplacian (up to a factor of \(-4\pi\)) that goes to \(0\) as \(|x|\to 0\). To see that, \[ \begin{align} \langle \Delta\frac{1}{|x|}, \varphi\rangle &= \langle \frac{1}{|x|}, \Delta\varphi\rangle \\ &= \int_{\mathbb R^3} \frac{1}{|x|} \Delta\varphi(x)\,dx\\ &= \end{align} \]

The fundamental solutions of the three classical equations, each giving rise to a vast subject (the so-called elliptic, parabolic, and hyperbolic PDEs, respectively), are listed below.

\(P(\partial)\) (equation) | \(n=1\) | \(n=2\) | \(n=3\) |

\(\Delta\) (Laplace's equation) | \(E(x)=\dfrac{1}{2}|x|\) | \(E(x)=\dfrac{1}{2\pi}\log(|x|)\) | \(E(x)=-\dfrac{1}{4\pi |x|}\) |

\(\partial_t -\Delta\) (heat equation) | \(E(t,x) = \dfrac{H(t)}{\sqrt{4\pi t}} e^{-x^2/(4t)}\) | \(E(t,x) = \dfrac{H(t)}{4\pi t} e^{-|x|^2/(4t)}\) | \(E(t,x) = \dfrac{H(t)}{(4\pi t)^{3/2}} e^{-|x|^2/(4t)}\) |

\(\partial_t^2 - \Delta\) (wave equation) | \(E(t,x)=\dfrac{1}{2} H(t-x)H(t+x) \) | \(E(t,x)=\dfrac{H(t-|x|)}{2\pi\sqrt{t^2-|x|^2}} \) | \(E(t,x) = \dfrac{\delta(t-|x|)}{4\pi |x|} \) |

## Tempered Distributions and Fourier Transform

Inside the space \(\mathcal D'(\mathbb R^n)\) of all distributions, there is a particularly well-behaved subspace of **tempered distributions** that do not grow too fast (hence "tempered") at infinity. "Not too fast" means (at most) polynomial growth, or "moderate" growth. Thus, all polynomials are tempered distributions, but functions such as \(e^x\) are not. This subspace is denoted \(\mathcal S'(\mathbb R^n)\), suggesting that it can be defined as the dual of a certain space \(\mathcal S(\mathbb R^n)\), which is that of the Schwartz functions: infinitely differentiable functions that are rapidly decreasing at infinity along with all its derivatives (the prototypical example is the "Gaussian" \(e^{-x^2/2}\)). We shall not go into the technical details, but turn to some of its most important properties.

As a stronger version of the Ehrenpreis-Malgrange theorem, all linear differential operator with constant coefficients in fact admit a *tempered* fundamental solution (and the examples above are all tempered). What makes tempered distributions special is that it is possible to define their Fourier transform; in fact they provide the most natural setting for Fourier transform, since the Fourier transform of a polynomial turns out to be a derivative of the \(\delta\)-function (at the origin).

There are different conventions for the (ordinary) Fourier transform, but in dealing with differential equations, it is most convenient to use \[ \hat\varphi(\xi) = \int_{\mathbb R^n} e^{-ix\xi} \varphi(x) \,dx \] and we can "recover" \(\varphi\) by the inverse Fourier transform: \[ \varphi(x) = \frac{1}{(2\pi)^n} \int_{\mathbb R^n} e^{ix\xi} \hat\varphi(\xi)\, d\xi \]The

Fourier transformof a tempered distribution \(u\in\mathcal S'(\mathbb R^n)\), denoted \(\hat u\) or \(\mathscr F u\), is defined by \[\langle \hat u, \varphi \rangle = \langle u, \hat\varphi \rangle \quad\text{for all} \quad\varphi\in\mathcal S\]