Lagrange Multipliers
The method of Lagrange multipliers is a technique in mathematics to find the local maxima or minima of a function \(f(x_1,x_2,\ldots,x_n)\) subject to constraints \(g_i (x_1,x_2,\ldots,x_n)=0\). Lagrange multipliers are also used very often in economics to help determine the equilibrium point of a system because they can be interested in maximizing/minimizing a certain outcome. Another classic example in microeconomics is the problem of maximizing consumer utility. This is the problem that arises when a consumer wants to know how to best spend their disposable income on the acquisition of goods. In the market, there are goods and prices, and consumers want to determine the best basket.
Contents
Method of Solving
First we partially differentiate \(f(x_1,x_2,\ldots,x_n)\) and \(g(x_1,x_2,\ldots,x_n)\) with every variable \(x_1,x_2,\ldots,x_n\) and write the following equations:
\[\begin{align} \dfrac{\partial f(x_1,x_2,\ldots,x_n)}{\partial x_1}&=\lambda\dfrac{\partial g(x_1,x_2,\ldots,x_n)}{\partial x_1} \\ \dfrac{\partial f(x_1,x_2,\ldots,x_n)}{\partial x_2}&=\lambda\dfrac{\partial g(x_1,x_2,\ldots,x_n)}{\partial x_2} \\ &\ \vdots \\ \dfrac{\partial f(x_1,x_2,\ldots,x_n)}{\partial x_n}&=\lambda\dfrac{\partial g(x_1,x_2,\ldots,x_n)}{\partial x_n}. \end{align}\]
Here, \(\lambda\) is called the Lagrange multiplier. Now we have \(n\) equations but there are \(n+1\) variables, so we use the given constraint as the \((n+1)^\text{th}\) equation. On solving these equations simultaneously, we get particular values of \(x_1,x_2,\ldots,x_n,\) which can be plugged in \(f(x_1,x_2,\ldots,x_n)\) to get the extremum value if it exists.
Find the dimensions of the box with the largest volume if the total surface area is \(64\text{ cm}^2\).
Here, \(f(x,y,z)=xyz\) and \(g(x,y,z) = xy+yz+xz=32.\)
Proceeding as above, we get
\[\begin{align} yz&=\lambda(y+z) &\qquad (1) \\ zx&=\lambda(z+x) &\qquad (2) \\ xy&=\lambda(x+y) &\qquad (3) \\ xy+yz+zx&=32. &\qquad (4) \end{align}\]
On solving, there are two solutions:
- \(\lambda =0,\) which makes either \(y=0\) or \(z=0\) by equation (1), but this is not possible.
- So the only possible solution left is \(x=y=z=\pm\sqrt{\frac{32}{3}}.\) Since \(x,y,z >0,\) we have \(x=y=z=\sqrt{\frac{32}{3}}=3.266 \text{ cm}.\) \(_\square\)
Note: We should be a little careful here. Since we’ve only got one solution, we might be tempted to assume that these are the dimensions that will give the largest volume. The method of Lagrange multipliers will give a set of points that will either maximize or minimize a given function subject to the constraint, provided there actually are minimums or maximums.
The function itself, \(f(x,y,z)=xyz\), will clearly have neither minimums nor maximums unless we put some restrictions on the variables. The only real restriction that we’ve got is that all the variables must be positive. This, of course, instantly means that the function does have a minimum, zero.
The function will not have a maximum if all the variables are allowed to increase without bound. That, however, can’t happen because of the constraint \(xy+yz+zx=32.\)
Here we’ve got the sum of three positive numbers \((\)because \(x, y,\) and \(z\) are positive\()\) and the sum must equal 32. So, if one of the variables gets very large, say \(x,\) then because each of the products must be less than 32, both \(y\) and \(z\) must be very small to make sure the first two terms are less than 32. So, there is no way for all the variables to increase without bound, and thus it should make some sense that the function \(f(x,y,z)=xyz\) will have a maximum.
This isn’t a rigorous proof that the function will have a maximum, but it should help to visualize that in fact it should have a maximum, so we can say that we will get a maximum volume if the dimensions (in centimeters) are
\[x=y=z=3.266.\]
Notice that we never actually found values for \(\lambda\) in the above example. This is fairly standard for these kinds of problems. The value of \(\lambda\) isn’t really important to determining if the point is a maximum or a minimum, so often we will not bother with finding a value for it. On occasion, we will need its value to help solve the system, but even in those cases, we won’t use it past finding the point.
Extremum Relative (Local) Conditioned
Of all the rectangles that can be inscribed in an ellipse, find the rectangle with maximum area.
Let's suppose that the equation for the ellipse is \(g(x,y) = \frac{x^2}{a^2} + \frac{y^2}{b^2} = 1\).
The rectangle being searched, having its vertices on the ellipse, must be symmetrical about the coordinate axes, so it is sufficient to find the coordinates \((x, y)\) of the vertex located in the first quadrant. The area we want to maximize is the function \(f(x,y) = 4xy.\)
In this particular case, we could clear \(y = \frac{b \sqrt{a^2 - x^2}}{a}.\) Substituting in the equation \(f,\) the problem could become to maximize the function \(F(x) = 4bx \cdot \frac{\sqrt{a^2 - x^2}}{a}.\) \(_\square\)
But a more general approach could give us binding conditions, but not allowing a clear variable depending on other variables. Hence the interest in studying the method of Lagrange multipliers.
Problem Statement
Let \(A\) be an open set in \(\mathbb{R}^{p +q}\) and \(a \in A\). Let \(f: A \to \mathbb{R}\) and \(g_1,...,g_p: A \to \mathbb{R}\) with \(g_1,...,g_p \in \mathcal{C}^1 (A)\) (functions with continuous partial derivative functions) such that the range of the Jacobian matrix
\[\left(\dfrac{\partial g_i}{\partial x_j}(a)\right)\ \text{ for }\ i = 1,\ldots,p\ \text{ and }\ j =1,\ldots,p + q\]
is \(p.\) Let \(S = \{x \in A \space/\space g_i (x) = 0, \space i = 1,...,p\}\) and let's suppose that \(a \in S\). It is said that the function \(f\) has \(a\) as a relative extremum conditioned by the equations \(g_i (x_1,\ldots,x_p,x_{p +1},\ldots, x_{p+q}) = 0\ \text{ for }\ i = 1,\ldots,p\) when there exists a neighborhood \(U\) of \(a\) in \(\mathbb{R}^{p + q}\) such that \(f(x) \leq f(a)\) or \(f(x) \ge f(a)\) \(\forall x \in S \cap U\). In the first case \(a\) is a maximum relative conditioned, and in the second case \(a\) is a minimum relative conditioned.
Notice that saying relative is similar to local and that saying conditioned is because of ligature conditions \((\)constraints of \(g_i).\)
Theorem (Lagrange Multipliers)
Theorem (Lagrange Multipliers)
Under the conditions of the establishment of the problem, if in addition \(f \in \mathcal{C}^1 (A)\), then for the function \(f\) to have an extremum relative conditioned at the point \(a,\) it's necessary to exist \(p\) real numbers \(\lambda_1,..., \lambda_p\) such that the function \( L = f + \lambda_1 g_1 +\cdots+ \lambda_p g_p\) fulfills \(dL(a) = 0.\) \((\lambda_1,..., \lambda_p\) are called Lagrange multipliers.\()\)
Let's suppose furthermore that \(g_i, f \in \mathcal{C}^2 (A) \space i = 1,...,p\) and at the point \(a \in A,\) is verified that \(dL(a) = 0\). Then for that \(f\) to have at \(a\) a minimum (respectively maximum) relative conditioned is sufficient \(d^2 L(a) (h,h) > 0 \space \forall h \in \mathbb{R}^{p + q}\) with \(h \neq 0\) (resp. < 0) and \(dg_i (a) = 0, \space i = 1,..., p\).
Before the long and not easy proof of this theorem, we are going to see 2 examples. Let's come back to the example above, in extremum relative conditioned.
Examples
Of all the rectangles that can be inscribed in an ellipse, find the rectangle with maximum area.
Let \(L(x,y) = 4xy + \lambda \left(\frac{x^2}{a^2} + \frac{y^2}{b^2} - 1 \right).\)
We want to maximize \(f(x,y) = 4xy\) (read the example above, in extremum relative conditioned).
\(g(x,y) = \frac{x^2}{a^2} + \frac{y^2}{b^2} - 1\).
For \((x,y)\) fulfilling \(dL(x,y) = 0\) is necessary:
\(\frac{dL}{dx} (x,y,z) = 4y + \frac{2\lambda}{a^2} x = 0\)
\(\frac{dL}{dy} (x,y,z) =4x + \frac{2\lambda}{b^2}y = 0\).
This, coupled with the ligature condition \(\frac{x^2}{a^2} + \frac{y^2}{b^2} - 1 = 0,\) must be fulfilled for the searched solutions \((x,y).\) The only solution of these equations in the first quadrant is \(\left(x = \frac{a}{\sqrt{2}}, y = \frac{b}{\sqrt{2}}\right)\) and \(\lambda = -2ab\). For knowing if \(\left(x = \frac{a}{\sqrt{2}}, y = \frac{b}{\sqrt{2}}\right)\) is a maximum or minimum, we do the second-degree differential. Consider \(h = (h_1,h_2) \in \mathbb{R}^2:\)
\[\begin{align} d^2 L\left(\frac{a}{\sqrt{2}}, \frac{b}{\sqrt{2}}\right) (h,h) &= \left(\begin{array} {c} h_1 & h_2 \end{array}\right) \cdot \left(\begin{array} {c c} \frac{-4b}{a} & 4 \\ 4 & \frac{-4a}{b} \end{array} \right) \cdot \left(\begin{array} {c c} h_1 \\ h_2 \end{array} \right) \\ &= -4\left(\frac{b}{a}h_1^{2} - 2h_1 h_2 + \frac{a}{b} h_2^{2} \right) \end{align}\]
and substitute the vectors \(h = (h_1, h_2) \in \mathbb{R}^2 \) to verify
\[dg \left(\frac{a}{\sqrt{2}}, \frac{b}{\sqrt{2}}\right) (h_1, h_2) = 0 = \frac{2a}{a^2 \sqrt{2}} h_1 + \frac{2b}{b^2 \sqrt{2}} h_2,\]
i.e.
\[\frac{h_1}{a} + \frac{h_2}{b} = 0.\]
For these vectors \(h \neq 0, h_2 = \frac{-b}{a}h_1 \) is
\[d^2 L\left(\frac{a}{\sqrt{2}}, \frac{b}{\sqrt{2}}\right) (h,h) = -16\frac{b}{a}h_1^{2} < 0.\]
Therefore, the solution obtained is a maximum \(\Big(\)i.e. the maximum area of a rectangle inscribed in \(\frac{x^2}{a^2} + \frac{y^2}{b^2} - 1 = 0\) is \(2ab\Big),\) which implies that in a circle of radius 1, the rectangle with maximum area is one square fulfilling area 2. \(_\square\)
Exercise: Find a parallelepiped inscribed in a ellipsoid \(\frac{x^2}{a^2} + \frac{y^2}{b^2} + \frac{z^2}{c^2} = 1\) with the maximum volume.
Answer: The maximum volume of the parallelepiped is \(\frac{8}{3\sqrt{3}} abc.\)
Example with Multiple Constraints:
Determine the points that are on the cylinder with equation \(x^2 + y^2 = 1\) and the plane with equation \(x + y + z = 1\) whose distance from the origin of coordinates is maximum or minimum.
\(L(x,y,z) = x^2 + y^2 + z^2 + \lambda(x^2 + y^2 -1) + \mu(x + y +z -1)\), then
\[\begin{align} dL(x,y,z) = 0 \iff \frac{dL}{dx} (x,y,z) &= 2x +2x\lambda + \mu \\&= 0\\ \frac{dL}{dy} (x,y,z) &= 2y + 2y\lambda + \mu \\&= 0\\ \frac{dL}{dz} (x,y,z) &= 2z + \mu \\&= 0 \end{align}\]
along with ligature conditions
\[\begin{align} x^2 + y^2 - 1 &= 0\\ x + y + z -1 &= 0 \end{align}\]
form a system of equations which must meet the points \((x, y, z)\) we seek. There are two solutions for this system of equations if \(z \neq 0:\)
- \(\space \mu_1 = - 2z_1, \quad 1 + \lambda_1 = \frac{z_1}{x_1} < 0, \quad x_1 = \frac{\sqrt{2}}{2}, \quad y_1 = \frac{\sqrt{2}}{2}, \quad z_1 = 1 - \sqrt{2}\)
- \(\space \mu_2 = - 2z_2, \quad 1 + \lambda_2 = \frac{z_2}{x_2} < 0, \quad x_2 = - \frac{\sqrt{2}}{2}, \quad y_2 = -\frac{\sqrt{2}}{2}, \quad z_2 = 1 + \sqrt{2}.\)
At this point, we can stop and analyze the first condition (necessary) that one function must satisfy to have one maximum or one minimum relative and study these 2 examples without going to the sufficient condition to be met. Nevertheless, we are going to study the sufficient condition. Let \(g_1(x,y,z) = x^2 + y^2 -1\) and \(g_2 = x + y + z -1\). Then
\[\begin{align} dg_1 (x,y,z) (h,k,l) &= 2xh + 2yk \\&= 0\\ dg_2 (x,y,z) (h,k,l) &=h + k + l \\&= 0. \end{align}\]
Particularizing at \((x_1, y_1,z_1),\) we get
\[\begin{align} \sqrt{2}h + \sqrt{2}k &= 0\\ h + k + l &= 0, \end{align} \]
whose solutions are \( h = -k\) and \( l = 0.\) Particularizing at \((x_2, y_2,z_2),\) we get the same solutions. Let's see what happens at \((x_1,y_1,z_1):\)
\[d^2 L(x_1,y_1,z_1) \big((h,k,l)^{2}\big) = 2(1 + \lambda_1)\big(h^2 + k^2\big) < 0\ \text{ if } h \neq 0,\]
so at \((x_1,y_1,z_1)\) we have a maximum relative conditioned, and
\[d^2 L(x_2,y_2,z_2) \big((h,k,l)^{2}\big) = 2(1 + \lambda_2)\big(h^2 + k^2\big) < 0\ \text{ if } h \neq 0,\]
so at \((x_2,y_2,z_2)\) we have another maximum relative conditioned.
It only needs to be examined what happens when \(1 + \lambda = 0\). In this case, the solutions obtained are \((1,0,0)\) and \((0,1,0)\). These solutions correspond to two minimum relative conditioned. \(_\square\)
Proof (Lagrange Multipliers)
Proof of Lagrange Multipliers Theorem:
(Necessary Condition)
Let \( g = (g_1,...,g_p) : A \longrightarrow \mathbb{R}^p, g \in \mathcal{C}^1 (A)\) due to \(g_1,..., g_p \in \mathcal{C}^1 (A)\). Because the range of the Jacobian matrix at \(a\) is \(p\), there is a square submatrix of it of order \(p\) whose determinant is not 0; let's suppose the submatrix is formed by the first \(p\) rows, i.e.
\[\left| \frac{\partial (g_1,...,g_p)}{\partial (x_1,...,x_p)} (a) \right| \neq 0.\]
Under these conditions and applying the implicit function theorem, there exists an open set \(A'' \subset \mathbb{R}^q\) and a function \(\psi: A'' \longrightarrow \mathbb{R}^p \space / \psi \in \mathcal{C}^1 (A'')\) such that if \(x = (x',x'')\) with \(x' = (x_1,...,x_p)\) and \(x'' = (x_{p + 1},...,x_{p + q})\), then \(\forall x \in A''\) we have
\[\color{red}{(*)} \space a'' \in A'', \space \psi(a'') = a', \space \psi(A'') \times A'' \subset A, \space g(\psi(x''),x'') = 0.\]
Let now the function \(\phi: A'' \longrightarrow \mathbb{R}^{p + q}\space / \space \phi(x'') =\big( (\psi(x''),x''\big)\). This function is an injective function (one to one) and \(\phi \in \mathcal{C}^1 (A'')\) and \(\phi(A'') \subset S.\) Let now the function \(F = f \circ \phi \in \mathcal{C}^1 (A'').\) If \(f\) has at \(a\) one extremum relative conditioned, \(F\) has at \(a''\) one relative ordinary extremum and of the same nature. Indeed, by hypothesis there exists a neighborhood \(U\) of \(a\), \(U \subset \mathbb{R}^{p + q}\) \((\)we can suppose that \(U \subset A),\) such that \(f(x) - f(a)\) has a constant sign \(\forall x \in S \cap U\). Due to \(\phi\) being a continuous function, \(U'' = \phi^{-1}(U)\) will be a neighborhood of \(a'' \in \mathbb{R}^{q}\), \(U'' \subset A''\) because if
\[\color{red}{(*)} \implies \phi(U'') \subset S \cap U\implies \forall x'' \in U'', \space F(x'') - F(a'') = f\big(\phi (x'')\big) - f\big(\phi (a'')\big)\]
will have the same constant sign as \( f(x) - f(a)\), then \(F\) will have an extremum relative ordinary at \(a''\) of the same nature that the extremum relative conditioned what \(f\) has at \(a\) \(\Rightarrow dF(a'') = 0\), i.e. \(\large{\frac{\partial F}{\partial x_{p + j}}} (a'') = 0\) for \(j = 1,...,q:\)
\[\displaystyle \color{purple}{(*)} \space \frac{\partial F}{\partial x_{p + j}} (a'') = \frac{\partial f}{\partial x_{p + j}} (a) + \sum_{k = 1}^p \frac{\partial f}{\partial x_{k}} (a) \cdot \frac{\partial \psi_{k}}{\partial x_{p + j}} (a '') = 0, \space j = 1,...,q.\]
The last equality \(\color{red}{(*)}\) is broken down into \(g_i \big(\psi (x''), x''\big) = 0 \space \forall x'' \in A'', \space i = 1,...,p.\) Differentiating with respect to \(x_{p + j}\) gives
\[\displaystyle \color{brown}{(*)} \space \frac{\partial g_i}{\partial x_{p + j}} (a) + \sum_{k = 1}^p \frac{\partial g_i}{\partial x_{k}} (a) \cdot \frac{\partial \psi_{k}}{\partial x_{p + j}} (a'') = 0, \space j = 1,...,q; \space i =1,...,p.\]
Since what is intended is to determine the numbers \(\lambda_1,...,\lambda_p\) so that \(L\) has a stationary point at \(a,\) we'll write \(\frac{\partial L}{\partial x_r} (a) = 0, \space r =1,...,p + q,\) i.e.
\[\displaystyle \frac{\partial f}{\partial x_r} (a) + \sum_{i = 1}^p \lambda_i \cdot \frac{\partial g_i}{\partial x_r} (a) = 0, \space r = 1,...,p + q.\]
This is a system of \(p + q\) equations with \(p\) unknowns \(\lambda_i\). This system taking its first \(p\) equations can clear our unknowns \(\lambda_i\) since the determinant forming system is \(\neq 0\). This part of the theorem will be proved if we prove that the unknowns \(\lambda_i\) satisfy the remaining \(q\) equations of the system. Indeed, for \(r = p + j,\) taking into account \(\color{purple}{(*)}\) and \(\color{brown}{(*)},\) we have
\[\begin{align} \displaystyle \frac{\partial f}{\partial x_{p + j}} (a) + \sum_{ i = 1}^p \lambda_i \frac{\partial g_i}{\partial x_{p + j}} (a ) &= \displaystyle - \sum_{k = 1}^p \frac{\partial f}{\partial x_k} (a) \cdot \frac{\partial \psi_k}{\partial x_{p + j}} (a'') - \sum_{i = 1}^p \lambda_i \cdot \sum_{k = 1}^p \frac{\partial g_i}{\partial x_k} (a) \cdot \frac{\partial \psi_k}{\partial x_{p +j}} (a'') \\ &= \displaystyle \sum_{k = 1}^p \left(\frac{\partial f}{\partial x_k} (a) + \sum_{i = 1}^p \lambda_i \cdot \frac{\partial g_i}{\partial x_k} (a)\right )\cdot \left( - \frac{\partial \psi_k}{\partial x_{p + j}} (a'')\right) \\ &= 0.\ _\square \end{align}\]
(Sufficient Condition)
Reductio ad absurdum.
If the function \(f\) did not take into \(a\) a minimum relative conditioned, there would exist for each \(n \in \mathbb{N}\) a point \(a_n \in U \cap S\), \(a_n \neq a\) such that \(|| a_n - a || < \frac{1}{n}\) and \(f(a_n) < f(a)\) with \(\displaystyle \lim_{n\to \infty} a_n = a\). Take \(b_n = \frac{a_n - a}{||a_n - a||}\) and \(\alpha_n = ||a_n - a|| \implies \) \(a_n = a + \alpha_n b_n\) and \(||b_n|| = 1\). Due to the compactness of the sphere of center \(0\) and radius \(1\) in \(\mathbb{R}^{p +q}\) deduce that can be extracted from \((b_n)\) a convergent subsequence converging to a point \(b\) of the same sphere \(\big(\)let's suppose that is the same sequence \((b_n)\) not to complicate the notation converging to \(b\big),\) since \(a_n \in S\) and \(a \in S\) we have \(g_j(a_n) = 0\) and \(g(a) = 0\) for \(j =1,...,p \implies\) \(f(a_n) - f(a) = L(a_n) - L(a)\). Applying Taylor's series,
\[f(a_n) - f(a) = dL(a) (a_n - a) + \frac{1}{2} d^2 L(c_n)(a_n -a, a_n - a),\]
where \(c_n\) is in the segment determined for \(a\) and \(a_n\) \(\big(dL(a) = 0\) by hypothesis and \(a_n - a = \alpha_n b_n\big),\) so
\[f(a_n) - f(a) = \frac{\alpha_n^{2}}{2} \cdot d^2 L(c_n)(b_n,b_n).\]
Since \(f \in \mathcal{C}^2 (A),\)
\[\displaystyle \lim_{n \to \infty} d^2 L(c_n) (b_n,b_n) = d^2 L(a) (b,b).\]
Admitting that \(d^2L (a) (b, b)> 0\), it concludes that \(d^2L (c_n) (b_n, b_n)> 0\) from a certain value of \(n\) onwards, whence follows \(f(a_ n) - f (a)> 0,\) which contradicts the condition \(f(a_n) < f (a)\) that we assumed as a result of admitting that \(f\) didn't have at \(a\) a relative minimum. The demonstration will be completed as we test that \(d^2L (a) (b, b) > 0\). According to the hypothesis of the theorem, it will be sufficient to prove that \(dg_j (a) (b) = 0\), since \(b \neq 0\) due to \(|| b || = 1\). But \(g_j (a_n) = g_j (a) = 0\) and applying the theorem of finite increases in differential calculus of several variables we'll have to
\[0 = dg_j (c'_n) (a_n - a) = \alpha_n \cdot dg_j (c'_n)(b_n).\]
Hence \(dg_j (c'_n) (b_n) = 0\) and taking limits we get, in fact, \(dg_j (a) (b) = 0\). \(_\square\)
References
- Ed.Tecnos Análisis matemático II, Topología y cálculo diferencial, J.A. Fernández Viña