# Distributions

The theory of **distributions** was introduced by Laurent Schwartz in the 1940s—based on the theory of (locally convex) topological vector spaces—to provide a firm foundation for a theory of **generalized functions**, such as the Dirac $\delta$-function. The main advantage is that the derivative of a distribution always exists, and is itself a distribution (hence is infinitely differentiable *in the sense of distributions*). It revolutionizes much of analysis that deals with derivatives, most notably with the concept of **fundamental solution** of a linear differential operator which supplants (or subsumes) the classical notions of Green's function and elementary solution. Compared to other "revolutions" in analysis, such as the epsilon-delta argument, properties of real numbers, and Lebesgue's theory of integration, Schwartz's theory of distributions is conceptually rather simple, and has the appeal of the calculus of Newton and Leibniz: one could work with them without worrying too much about the (functional analytic) foundation.

#### Contents

## Motivation

There were many motivations that led to Schwartz's theory, going as far back as the general solution $u(x,t)= f(x-t) + g(x+t)$ to the wave equation $\frac{\partial^2 u}{\partial t^2} - \frac{\partial^2 u}{\partial x^2} = 0,$ which curiously does not require that $f$ and $g$ be differentiable, an observation made by Euler. But the Dirac $\delta$-function is perhaps the most striking and the best-known example. It is often characterized by the properties

- $\delta(x)=0$ for all $x\neq 0$, and
- $\int_{-\infty}^{\infty} \delta(x)\,dx = 1,$ or more generally $\color{#EC7300} \int_{-\infty}^{\infty} \delta(x) f(x)\, dx = f(0)$ so long as $f(x)$ is continuous at $x=0$,

which clearly is impossible within the framework of ordinary functions and any sensible notion of integration. At best it should be regarded as the "limit" of a sequence of actual functions with higher and narrower spike at $x=0$. In fact, mathematicians had used this kind of limiting procedure, often under the name of *approximation to the identity* (*Dirichlet kernel* being one example), in the study of Fourier series.

Physicists had been more liberal in treating $\delta(x)$ as a bona fide function (of a real variable $x$). For instance, it can be translated: $\delta(x-x_0)$ describes a spike at $x=x_0,$ and dilated: $\delta(ax) = \frac{\delta(x)}{|a|} \quad a\neq 0,$ and most importantly, it makes sense to say that $\delta(x)$ is the derivative of the (unit) "step function", also called Heaviside function: $H(x) = \begin{cases} 1 & x\geq 0 \\ 0 & x<0. \end{cases}$ To say $\color{#EC7300} H'(x) = \delta(x)$ means that we can integrate the $\delta$-function: $\int_a^b \delta(x)\,dx = H(b)-H(a) = \begin{cases} 1 & \text{if }a<0<b \\ 0 & \text{otherwise}, \end{cases}$ as desired. (If $a=0$ or $b=0,$ we have to say it's undefined.) Moreover, by "applying" the Fourier inversion formula, one obtains the curious formula $\color{#EC7300} \int_{-\infty}^{\infty} e^{ix\xi}\,d\xi = 2\pi \delta(x).$ A thorny issue is multiplication: while it is perfectly fine to multiply $\delta(x)$ by an ordinary function, so long as that function is continuous at $x=0$, it does not seem possible to make sense of the products like $\delta(x)\delta(x)$ or $\delta(x)\frac{1}{x}$.

The calculus of $\delta$-functions developed by physicists is not confined to one dimension. For example, the product $\delta(x)\delta(y)\delta(z)$ makes perfectly good sense: it represents the mass (or electric charge) density of a single point-mass (or point-charge) at the origin with total mass (or total charge) = 1. We shall for the moment denote it by $\delta(\vec x)$, though physicists prefer to write $\delta^3(\vec x)$. By this and other expressions one can describe any "distribution" of electric charge—at various points, along a piece of curve, or on a sheet of surface—as a single "generalized function" $\rho(\vec x)$ on $\mathbb R^3$, which enters the equation of electrostatics (Poisson's equation) as the "source term": $\Delta\phi = -4\pi \rho,$ where $\Delta := \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} + \frac{\partial^2}{\partial z^2}$ is the Laplacian $\big($on $\mathbb R^3\big).$ The unknown $\phi$, the (electric) potential, is to be solved with the extra "boundary" condition that $\phi\to 0$ as $|\vec x|\to \infty$. For example, a single point-charge $\rho(\vec x) = q\delta(\vec x)$ ought to "produce" $\phi(\vec x) = \frac{q}{|\vec x|}$, which is to say $\color{#EC7300} \Delta \frac{1}{|\vec x|} = -4\pi\delta(\vec x),$ a classical fact that could have been known to Newton (his "inverse square law"). This identity helps justify the general solution for an arbitrary charge distribution $\rho(\vec x)$: $\phi(x) = \int \frac{\rho(\vec y)}{|\vec x-\vec y|}\, d\vec y$ $($simply let $\Delta$ act inside the integral...$),$ which may also be interpreted as a "superposition" of potentials, one for each "charge" $\rho (\vec y)\, d\vec y$ at the point $\vec y\in\mathbb R^3$ $(d\vec y$ being the infinitesimal volume, or "volume form"$).$ This and many other identities really make the $\delta$-function a worthy object of mathematical inquiry. As we shall see, the theory of distributions gives a very natural interpretation to all these identities.

The charge distribution of electrostatics may well be the source of the terminology for Schwartz, for it is even possible to represent an electric dipole—two equal and opposite point-charges very close to each other—as a distribution, but not as a (Radon) measure. It's a little unfortunate that the term distribution has often been confused with, or thought to be a generalization of, a probability distribution.

## Distributions: Basic Definitions and Examples

Distributions can be defined on any open set $U\subset \mathbb R^n$ (and thus on any smooth manifold), and the space of all distributions on $U$ is denoted by $\mathcal D'(U)$. However, most of the naturally occurring distributions, including those mentioned above, are in fact defined on $\mathbb R^n$, so we shall state our definitions for $\mathcal D'(\mathbb R^n)$ even though there is virtually no change for the general case.

Let $\mathcal D(\mathbb R^n)$ be the set $($in fact $\mathbb C$-vector space$)$ of all infinitely differentiable functions $\varphi:\mathbb R^n \to \mathbb C$ with compact support, i.e. there exists a compact (closed and bounded) set $K\subset\mathbb R^n$ such that $\varphi(x)=0$ for all $x\not\in K$. The space $\mathcal D(\mathbb R^n)$ is endowed with the topology that $\varphi_k\to\varphi$ $($in $\mathcal D),$ as $k\to\infty$, if all the $\varphi_k$ are supported in the same compact set $K$, and for any multi-index $\alpha\in\mathbb N^n$, $\partial^\alpha \varphi_k$ converges to $\partial^\alpha\varphi$ uniformly.

There is an abundance of such functions: the intuition is that any "hand-drawn" function can be "smoothed out" to be infinitely differentiable, even though it's not easy to write one down with a formula. The space $\mathcal D(\mathbb R^n)$, not necessarily with the topology, is also denoted by $C_0^\infty(\mathbb R^n)$ or $C_c^\infty(\mathbb R^n)$. Also, there is no need to be alarmed by the appearance of $\mathbb C;$ you are free to think of real-valued functions for the most part.

A

distribution$($on $\mathbb R^n)$ is a continuous linear functional $u: \mathcal D(\mathbb R^n) \to \mathbb C$, i.e.,

(linearity) $u( c_1\varphi_1+c_2\varphi_2) = c_1 u(\varphi_1) + c_2 u(\varphi_2)$ for all $c_1, c_2\in\mathbb C$ and all $\varphi_1, \varphi_2\in\mathcal D$; and

(continuity) $u(\varphi_k) \to u(\varphi) \quad\text{as}\quad{\varphi_k\to\varphi}\;(\text{in }\mathcal D).$

The space of all distributions, endowed with the weak topology, is denoted by $\mathcal D'(\mathbb R^n)$.

As a matter of fact, any linear functional you can think of will turn out to be continuous, so from the practical point of view, one may ignore the topology altogether.

The Dirac $\delta$-function is a distribution $\delta\in\mathcal D'(\mathbb R^n)$ defined by $\begin{aligned} \delta: \mathcal D(\mathbb R^n) &\to \mathbb C \\ \varphi &\mapsto \varphi(0), \end{aligned}$ which one checks is linear (and takes continuity for granted).

For the sake of intuition, we can have a more general "$\delta$-function type" distribution: for any hypersurface $S\subset\mathbb R^n$, such as the sphere $S^{n-1}$ (or for that matter, a submanifold of any dimension), we have a distribution defined by $\begin{aligned} \delta_S: \mathcal D(\mathbb R^n) &\to \mathbb C \\ \varphi &\mapsto \int_{S} \varphi(x) \, d\sigma(x), \end{aligned}$ where $d\sigma$ is the "surface element" on $S$.

A whole class of examples are given by actual functions, which in particular include all continuous functions $f:\mathbb R^n\to\mathbb C$.

If $f:\mathbb R^n\to\mathbb C$ is

locally integrable, i.e. for any compact set $K$, $\int_K |f(x)|\,dx$ is finite, then $\begin{aligned} T_f: \mathcal D(\mathbb R^n) &\to \mathbb C \\ \varphi &\mapsto \int_{\mathbb R^n} f(x)\varphi(x)\,dx \end{aligned}$ is a distribution and, by abuse of notation, is often denoted simply by $f$. Note that changing the values of $f$ on a set of measure zero does not alter the distribution $T_f$.For $n=1$, the Heaviside function $H(x) := \begin{cases} 1 & x>0 \\ 0 & x<0 \end{cases}$ is locally integrable and thus defines a distribution by the recipe above. Note we don't need to specify the value of $H(0)$. For a less trivial example, $x_+^s := \begin{cases} x^s & x>0 \\ 0 & x<0 \end{cases}$ defines a distribution if and only if $s > -1$ $($or $\operatorname{Re} s> -1$ if we allow $s$ to be complex$).$

As defined, distributions are nothing like actual functions, for one can not evaluate $u\in\mathcal D'(\mathbb R^n)$ at a single point $x\in\mathbb R^n$. Nevertheless, it is meaningful to speak of the "values" on an open set (small or big) "collectively": we say $u$ *vanishes on an open set* $U\subset\mathbb R^n$ if $u(\varphi)=0$ for all $\varphi$ supported in $U$. For example, $\delta_S$ vanishes on $\mathbb R^n-S$. Similarly, we say $u$ *agrees with a function* $f(x)$ on an open set $U\subset\mathbb R^n$ if $u(\varphi) = \int f(x)\varphi(x)\,dx$ for all $\varphi$ supported in $U$. Incidentally, those functions $\varphi\in\mathcal D(\mathbb R^n)$ are often called *test functions* because one can imagine that they are being used to "test" or "detect" the values of a function (or a distribution) on an open set.

It may be remarked that distributions exemplify the dominant viewpoint of modern mathematics that may go under the term "formalism" or "functionalism": a mathematical object is defined by what it *does* (in relation to other objects), instead of what it *is* intrinsically.

The most important fact of distributions is that it can always be differentiated. The definition is such that it agrees with the usual notion of derivatives on actual functions (sometimes called *regular* functions for emphasis, though it's a terribly overloaded word in mathematics), and in this sense distributions are said to be generalized functions. To "derive" the definition, let $u=T_f$ for some $C^1$ function $f:\mathbb R^n\to\mathbb C$ so its partial derivative $\partial_i f = \frac{\partial f}{\partial x_i}$, $i=1, \ldots, n$, is continuous and therefore defines a distribution $T_{\partial_i f}$ $($which is what we want $\partial_i u$ to be$):$
$\varphi \mapsto \int_{\mathbb R^n} \partial_i f(x)\varphi(x)\,dx.$
By integration by part, we have
$\int_{\mathbb R^n} \partial_i f(x)\varphi(x)\,dx = -\int_{\mathbb R^n}f(x) \partial_i \varphi(x)\,dx,$
where the boundary term drops out because $\varphi$ is compactly supported. Thus, the distribution $T_{\partial_i f}$ can also be defined by
$\varphi \mapsto -\int_{\mathbb R^n} f(x) \partial_i \varphi(x)\,dx = -T_f(\partial_i\varphi),$
where the derivative is acting on $\varphi$ instead. So, we can extend this definition to all $u\in\mathcal D'(\mathbb R^n)$, to wit
$\varphi \mapsto -u(\partial_i\varphi)$
is a distribution, and we shall denote it by $\partial_i u$. It is often convenient to denote the value of $u(\varphi)$ by $\langle u, \varphi\rangle$, often used for the "pairing" of a finite-dimensional vector space with its dual. For instance, the $\delta$-function may be written as $\langle\delta, \varphi\rangle = \varphi(0)$ for all $\varphi\in\mathcal D$.

For any distribution $u\in\mathcal D'(\mathbb R^n)$, its

partial derivative $\partial_i u$ in the $i^\text{th}$ directionis a distribution defined by $\langle \partial_i u, \varphi\rangle = -\langle u, \partial_i \varphi\rangle \quad \text{for all}\quad\varphi\in\mathcal D.$ Consequently, for any multi-index $\alpha$, $\partial^\alpha u$ is a distribution defined by $\langle \partial^\alpha u, \varphi\rangle = (-1)^{|\alpha|}\langle u, \partial^\alpha \varphi\rangle \quad \text{for all}\quad\varphi\in\mathcal D.$

The derivatives of the $\delta$-function $\delta\in\mathcal D'(\mathbb R^n$) are given by $\varphi \mapsto -\frac{\partial\varphi}{\partial x_i}(0)$ (and similarly for higher derivatives).

For $n=1$, we can verify the identity $H' = \delta$

in the sense of distributions: $\langle H', \varphi\rangle = -\langle H, \varphi'\rangle = -\int_0^\infty \varphi'(x)\,dx = - \varphi(x)\bigr|_0^\infty = \varphi(0) = \langle\delta, \varphi\rangle$ for all $\varphi\in\mathcal D(\mathbb R)$.

In fact, the procedure of "deriving" the definition works for many other operations, so long as it is well defined on the test functions $\varphi\in\mathcal D(\mathbb R^n)$ (e.g. translations, Fourier transform, convolution). More precisely, if $Lf$ is well-defined for (some class of) locally integrable functions $f$, and $\langle Lf, \varphi\rangle = \langle f, L^* \varphi\rangle \quad\text{for all}\quad \varphi\in\mathcal D$ for some operation $L^*$, we can define $Lu$ for an arbitrary distribution $u\in\mathcal D'(\mathbb R^n)$ by the same expression: $\langle Lu, \varphi\rangle = \langle u, L^* \varphi\rangle \quad\text{for all}\quad \varphi\in\mathcal D.$ For the case of derivatives, $L=\partial_i$ and the "adjoint" is $L^*=-\partial_i$. As exercises, come up with your own definition for translating a distribution, and more generally a "change of variables" on $\mathbb R^n$.

Having at first rid our distributions of any semblance of functions, we carefully recover all the properties that functions enjoy with the sole exception of evaluation at a point. At the risk of serious notational confusion (for a first encounter), we may at times write distributions $u\in\mathcal D'$ as if they were functions, taking on a variable as $u(x),$

butremember not to substitute $x$ by a particular number $($or rather a tuple of numbers $x_1,\ldots, x_n).$ In particular, we now have a third way of writing theexact same thing:$u(\varphi) = \langle u, \varphi\rangle = \int_{\mathbb R^n} u(x)\varphi(x)\, dx. \qquad\text{(formally)}$ The word "formally" is to remind you that this isnotan integral in Riemann's or Lebesgue's sense, but a shorthand—or rather, a longhand—for the evaluation of $u$ at $\varphi$. The advantage of writing $u$ as $u(x)$ is that we may express many naturally occurring distributions more easily, without constantly referring to $\varphi\in\mathcal D$. See examples below.

Now with the definition of distributional derivatives, we come to the first major achievement of the theory, the notion of fundamental solution of a linear differential equation, or rather a linear differential operator, on $\mathbb R^n$: $P(x,\partial) = \sum_{\alpha} a_{\alpha}(x)\partial^\alpha$ $\big($the sum is finite, and the coefficients $a_\alpha(x)$ are $C^\infty$ functions on $\mathbb R^n$, so it acts on $\mathcal D'(\mathbb R^n)\big).$ One very special class is when all the coefficients are constants, for which we can define the fundamental solution more easily.

A

fundamental solution(or Green's function) of a linear differential operator $P(\partial)$ with constant coefficients is a distribution $E\in\mathcal D'(\mathbb R^n)$ such that$P(\partial) E = \delta.$

(

Malgrange-Ehrenpreis, 1955)Every non-zero linear differential operator $P(\partial)$ with constant coefficients has a fundamental solution.

Remark: It is not unique since one can always add a "homogeneous solution" (such as a constant) to $E$. For certain types of $P(\partial)$, one can impose extra conditions that will ensure a unique fundamental solution that, in some sense, is the best one. (The original proofs were nonconstructive; now there are rather explicit formulas, using e.g. Fourier transform.) The fundamental solution is "fundamental" because it enables one to solve a variety of initial or boundary value problems, with any "source term"—coined in precise terms as the Duhamel's principle, but becomes much more clear in the language of distributions, after developing the notion of convolution. See Green's functions in physics for some applications.

The (Newtonian) potential $\frac{1}{|x|}$ defines a distribution on $\mathbb R^3$, and it is the unique fundamental solution for the Laplacian $($up to a factor of $-4\pi)$ that goes to $0$ as $|x|\to \infty$. To see that, $\begin{aligned} \Big\langle \Delta\frac{1}{|x|}, \varphi\Big\rangle &= \Big\langle \frac{1}{|x|}, \Delta\varphi\Big\rangle \\ &= \int_{\mathbb R^3} \frac{1}{|x|} \Delta\varphi(x)\,dx\\ &= \end{aligned}$

We close with a list of the ("best") fundamental solutions for the three classical equations, each giving rise to a vast subject (the so-called elliptic, parabolic, and hyperbolic PDEs, respectively). They serve as a reminder that, despite everything above, how nontrivial fundamental solutions—and distributions in general—can be, and as the background for Malgrange-Ehrenpreis theorem, which may be regarded as a fundamental theorem of analysis.

$P(\partial)$ (equation) | $\hspace{10mm} n=1\hspace{10mm}$ | $\hspace{10mm} n=2\hspace{10mm}$ | $\hspace{17mm} n=3\hspace{10mm}$ | $\hspace{10mm} n=4\hspace{10mm}$ |

$\Delta$ (Laplace/Poisson) | $\hspace{10mm} \dfrac{1}{2}|x|$ | $\dfrac{1}{2\pi}\log(|x|)$ | $\hspace{10mm} -\dfrac{1}{4\pi |x|}$ | $E(x) =$ |

$\partial_t -\Delta$ (heat eq.) | $\hspace{12mm}$ — | $\dfrac{H(t)}{\sqrt{4\pi t}} e^{-x^2/(4t)}$ | $\hspace{10mm} \dfrac{H(t)}{4\pi t} e^{-|x|^2/(4t)}\hspace{10mm}$ | $\dfrac{H(t)}{(4\pi t)^{3/2}} e^{-|x|^2/(4t)}$ |

$\partial_t^2 - \Delta$ (wave eq.) | $\hspace{12mm}$ — | $\dfrac{1}{2} H(t-x)H(t+x)$ | $\hspace{10mm} \dfrac{H(t-|x|)}{2\pi\sqrt{t^2-|x|^2}}\hspace{10mm}$ | $\dfrac{\delta(t-|x|)}{4\pi |x|}$ |

## Fourier Transform and Tempered Distributions

By applying the formal properties of the $\delta$-function, one could begin to explore its Fourier transform: $\mathscr F [\delta](\xi) = \int_{\mathbb R^n} e^{-i\langle x, \xi\rangle} \delta(x)\,dx = e^0 = 1.$ That is, the Fourier transform of $\delta(x)$, when properly defined, should be the constant function $1$. [Various conventions are in common use for the definition of Fourier transform, and for purposes of differential equations, the one given above, with no appearance of $2\pi$, is the most convenient.]

One of the main properties of Fourier transform that we want to preserve is the straightforward calculation: $\begin{aligned} \mathscr F [\partial_j f](\xi) &= \int_{\mathbb R^n} e^{-i\langle x,\xi\rangle} \partial_j f(x)\,dx \\ &= -\int_{\mathbb R^n} \partial_j e^{-i\langle x,\xi\rangle} f(x)\,dx \\ &= -\int_{\mathbb R^n} (-i\xi_j) e^{-i\langle x,\xi\rangle} f(x)\,dx = i\xi_j \mathscr F[f](\xi) \end{aligned}$ (what kind of $f$ are we assuming here?), or more succinctly $\mathscr F \partial_j = i\xi_j \mathscr F.$ This is what makes Fourier transform so useful for differential equations: it converts differentiation into multiplication. For example, to find the fundamental solution, we may simply take Fourier transform of both sides of the defining equation $P(\partial) u = \delta,$ which, by the property above, becomes $P(i\xi) \mathscr F[u](\xi) = 1.$ One can "easily" solve it: $\mathscr F [u] (\xi) = \frac{1}{P(i\xi)}$ and, taking the inverse Fourier transform, we have an explicit fundamental solution for $P(\partial)$, namely $u = \mathscr F^{-1} \left[ \frac{1}{P(i\xi)} \right].$ There are two immediate difficulties and one puzzle:

- Is $\frac{1}{P(i\xi)}$ a distribution on $\mathbb R^n?$ If the polynomial $\xi\mapsto P(i\xi)$ never vanishes for $\xi\in\mathbb R^n$, sure enough. But what about other $P?$
- How is the inverse Fourier transform defined, or computed?
- We know that the fundamental solution is not unique. How did we miss all the other solutions?

All these have satisfactory answers in Schwartz's theory. The heart of the matter is that Fourier transform *cannot* be defined for all distributions $u\in\mathcal D'(\mathbb R^n)$, but only for a subspace of **tempered distributions** that do not grow too fast at infinity (hence the name "tempered"). Here "not too fast" means (at most) polynomial growth, also called "moderate" growth. Thus, all polynomials are (or can be regarded as) tempered distributions, but functions such as $e^x$ are not. This subspace is denoted $\mathcal S'(\mathbb R^n)$, suggesting that it is formally defined as the (continuous) dual of a certain space $\mathcal S(\mathbb R^n)$, which is often called Schwartz functions: infinitely differentiable functions on $\mathbb R^n$ that are rapidly decreasing at infinity along with all its derivatives $\big($the prototypical example is the "Gaussian" $e^{-x^2/2}\big).$ We shall not go into more details (especially the issue of topology), but turn straight to the definition. $\big[$One may "derive" the definition by finding the adjoint of $\mathscr F.\big]$

$($Implicit in the definition above is that $\mathscr F[\varphi]\in\mathcal S$ for all $\varphi\in\mathcal S.)$The

Fourier transformof a tempered distribution $u\in\mathcal S'(\mathbb R^n)$ is a tempered distribution, denoted $\mathscr F[u]$ or $\hat u$, defined by $\big\langle \mathscr F[u], \varphi \big\rangle = \big\langle u, \mathscr F[\varphi] \big\rangle \quad\text{for all} \quad\varphi\in\mathcal S,$ where $\mathscr F[\varphi](\xi) := \int_{\mathbb R^n} e^{-i\langle x,\xi\rangle} \varphi(x)\, dx.$

The Fourier transform $\mathscr F: \mathcal S'(\mathbb R^n) \to \mathcal S'(\mathbb R^2)$ is an isomorphism, and the inverse is given by $u(x) = \frac{1}{(2\pi)^n}\int_{\mathbb R^n} e^{i\langle x,\xi\rangle} \hat u(\xi)\, d\xi,$

interpreted in the sense of distributions.

Exercise: spell out the interpretation.

Ignoring the topology, one can readily check the following: (These calculations demonstrate particularly well how easy it is to work with distributions, and places that more careful justifications may be needed.)

It agrees with the definition when $u\in L^1(\mathbb R^n)$, or more pedantically, for $u=T_f$ with $f\in L^1$, $\mathscr F[u] = T_{\mathscr F[f]}$.

$\mathscr F[\delta] = 1$. Indeed, by pairing with (arbitrary) $\varphi\in S$,

The property $\mathscr F \partial_j = i\xi_j \mathscr F$ can now be established for all $u\in\mathcal S'(\mathbb R^n)$. A similar calculation shows that $\mathscr F x_j = -i\partial_j\mathscr F$.

For 2, $\big\langle\mathscr F[\delta], \varphi \big\rangle = \big\langle \delta, \mathscr F[\varphi]\big\rangle = \mathscr F[\varphi](0) = \int_{\mathbb R^n} \varphi(x)\,dx = \langle 1, \varphi\rangle.$For 3, $\big\langle \mathscr F[\partial_j u], \varphi \big\rangle = \big\langle \partial_j u, \mathscr F[\varphi] \big\rangle = -\big\langle u, \partial_j \mathscr F[\varphi] \big\rangle = -\big\langle u, \mathscr F[-i x_j\varphi] \big\rangle = -\big\langle \mathscr F[u], -i x_j \varphi \big\rangle = \big\langle i\xi_j \mathscr F[u], \varphi \big\rangle.$

Now for 1, we want to show $\mathscr F[T_f] = T_{\mathscr F[f]}$. $\mathscr F[u](\xi) = \int_{\mathbb R^n} e^{-i\langle x,\xi\rangle} u(x)\,dx$ but remarks that it is to be

interpreted in the sense of distributions, i.e. "pairing with $\varphi\in\mathcal S$" and manipulate symbolsformallyuntil one arrives at a sensible expression, which is taken to be the definition: $\begin{aligned} \big\langle \mathscr F[u], \varphi \big\rangle &= \int_{\xi\in\mathbb R^n} \mathscr F[u](\xi)\varphi(\xi)\,d\xi \\ &= \int_{\xi\in \mathbb R^n} \left( \int_{x\in\mathbb R^n} e^{-i\langle x,\xi\rangle} u(x)\,dx \right) \varphi(\xi)\,d\xi \\ &= \int_{x\in \mathbb R^n} u(x) \left( \int_{\xi\in\mathbb R^n} e^{-i\langle x, \xi\rangle} \varphi(\xi)\,d\xi \right) \, dx \\ &= \int_{x\in\mathbb R^n} u(x) \mathscr F[\varphi](x) \,dx = \big\langle u, \mathscr F[\varphi] \big\rangle. \end{aligned}$

From $\mathscr F[\delta] = 1$ (in one dimension), we can deduce physicist's favorite formula: $\delta(x) = \frac{1}{2\pi}\int_{-\infty}^\infty e^{ix\xi}\,d\xi.$ Combined with 3' above, we conclude the Fourier transform of any polynomial $P(x)$, regarded as a tempered distribution, is a derivative of $\delta(\xi)$ $($i.e., a $\delta$-function at the origin$):$ $\mathscr F[P](\xi) = (2\pi)^n P(-i\partial)\delta(\xi).$

Now, let's address the three issues for solving the fundamental solution by Fourier transform, in the specific case of the Laplacian $P(\partial) = \Delta$ on $\mathbb R^n$, $n\geq 3$.

- $\xi\mapsto \frac{1}{P(i\xi)} = \frac{1}{-|\xi|^2}$ is indeed locally integrable $($for $n\geq 3),$ thus it is a distribution on $\mathbb R^n$.
- Yes, it is tempered, so its inverse Fourier transform is defined. In fact, $u = \mathscr F^{-1} \left[-|\xi|^{-2}\right] = c_n |x|^{-n+2}$ $($for some constant $c_n).$
- Division in the space of distributions is not uniquely defined. In this case, we can add to the distribution $-|\xi|^{-2}$ any multiple of $\delta$, or a (first) partial derivative of $\delta$—these are the general solution to the "homogeneous equation" $-|\xi|^2 \mathscr F[u](\xi) = 0$.

In some sense, tempered distributions are the right setting for Fourier transform. Here is a list of common (elementary) functions on the real line $\mathbb R$ (regarded as tempered distributions) and their Fourier transform $($with the factor of $2\pi$ tucked away$)$: it may be more convenient to read the table $u(x) = \int_{-\infty}^\infty e^{ix\xi} \frac{\hat u(\xi) }{2\pi} \,dx.$

$u(x)\in\mathcal S'(\mathbb R) \hspace{10mm}$ | $\frac{1}{2\pi}\hat u(\xi) \in\mathcal S'(\mathbb R)$ |

$1$ | $\delta(\xi)$ |

$e^{ix}$ | $\delta(\xi-1)$ |

$\cos x$ | $\frac{1}{2} \bigl(\delta(\xi-1) + \delta(\xi+1)\bigr)$ |

$\sin x$ | $\frac{1}{2i} \bigl(\delta(\xi-1) - \delta(\xi+1)\bigr)$ |

$x^n$ | $i^n \delta^{(n)}(\xi)$ |

$e^{-ax^2/2}$ | $\frac{1}{\sqrt{2\pi a}} e^{-\xi^2/2a} \quad (a>0)$ |

$x^n e^{ix}$ | $i^n \delta^{(n)}(\xi-1)$ |

*Remarks:*

- A function is periodic if and only if its Fourier transform consists of $\delta$-functions on an integral lattice. This is the case of Fourier series.
- A function is real if and only if its Fourier transform is symmetric under $\xi\mapsto -\xi$ followed by complex conjugation.
- Roughly speaking, a function is more "spread-out" if its Fourier transform is more "localized", and vice versa. This is related to the Heisenberg uncertainty principle in quantum mechanics.

Tempered distributions are especially well suited for linear differential operators $P(x,\partial)$ with *polynomial* coefficients $($whereby $x$ and $\partial$ are being treated on an equal footing$),$ since $\mathcal S(\mathbb R^n)$ and $\mathcal S'(\mathbb R^n)$ are closed under multiplication by polynomials, but not general functions. As one might expect, many questions are more of a flavor of algebra. For illustration, the "solution space" of the following operators (in one dimension) can be described, and each is a two-dimensional vector space:
$\begin{aligned} \partial^2 u =0 \quad &\Longleftrightarrow\quad u = a+bx \\
x^2 u = 0 \quad &\Longleftrightarrow\quad u = a\delta + b \delta' \\
(x\partial-s) u = 0 \quad &\Longleftrightarrow \quad u = \begin{cases} ax_+^s + b x_-^s & \operatorname{Re} s>-1 \\ a\underline x^{-1} + b \delta & s=-1 \\ a\underline x^{-2} + b \delta' & s=-2 \\ \hspace{13mm} \vdots \end{cases} \end{aligned}$
The last one is a description of *homogeneous distributions of degree $s$* on $\mathbb R$, where $\underline x^{-k}\in\mathcal S'(\mathbb R)$, $k=1, 2,\ldots$, are defined by
$\varphi \mapsto \int_{-\infty}^\infty \frac{\varphi(x) - \sum_{j=0}^{k-1} \varphi^j(0)x^j/j!}{x^k}\,dx,$
which agrees with the function $x^{-k}$ on $\mathbb R-\{0\}$. This is Hadamard's *partie finie*, and for $k=1$ is known as the Cauchy principal value. $($One can give a complete description for all $s\in\mathbb C.)$ It is tempting to note that the dimension of (distributional) solution space is simply the "degree" of the operator, where $x$ and $\partial$ both have degree 1. This is yet another way that distributions "complete" functions, in much the same way that complex numbers "complete" real numbers (when solving algebraic equations), or that the projective geometry completes Euclidean geometry (e.g. point-line duality in the plane).