Spaces of differentiable functions on an interval \(I=[a,b]\subset\Real\):
\[
C(I,\Real), C^1(I,\Real),\ldots, C^n(I,\Real),\ldots
\]
For a function in \(C^1(I,\Real)\),
\[
f(y)=f(x)+\int_x^yf'(s)ds.
\]
Iterating (for functions in \(C^n(I,\Real)\), we obtain
\[
f(y)=f(x)+f'(x)(y-x)+\frac{f”(x)}{2!}(y-x)^2+\ldots+\frac{f^{(n-1)}(x)}{(n-1)!}(y-x)^{n-1}+
\int_{x<s_1<s_2<\ldots<s_n<y}f^{(n)}(s_1)\frac{(y-s_1)^{n-1}}{(n-1)!}ds.
\]
Implications:
Domain: \(U\subset\Real^n\), an open set.
Reminder: open, closed, bounded, compact sets.
Exercise: is the set \(\{|x|\leq 1, |y| \lt 1\}\) open?…
]]>Spaces of differentiable functions on an interval \(I=[a,b]\subset\Real\):
\[
C(I,\Real), C^1(I,\Real),\ldots, C^n(I,\Real),\ldots
\]
For a function in \(C^1(I,\Real)\),
\[
f(y)=f(x)+\int_x^yf'(s)ds.
\]
Iterating (for functions in \(C^n(I,\Real)\), we obtain
\[
f(y)=f(x)+f'(x)(y-x)+\frac{f”(x)}{2!}(y-x)^2+\ldots+\frac{f^{(n-1)}(x)}{(n-1)!}(y-x)^{n-1}+
\int_{x<s_1<s_2<\ldots<s_n<y}f^{(n)}(s_1)\frac{(y-s_1)^{n-1}}{(n-1)!}ds.
\]
Implications:
Domain: \(U\subset\Real^n\), an open set.
Reminder: open, closed, bounded, compact sets.
Exercise: is the set \(\{|x|\leq 1, |y| \lt 1\}\) open? closed? bounded?
Partial and directional derivatives; \(\nabla f\).
Remark: the definition of the directional derivative depends on the other coordinates held constant.
Gateaux and Frechet derivatives
Gateaux derivative is in essence directional one. Frechet derivative asks for the function locally to be close, up to higher order terms;
\[
f(x+v)=f(x)+(l,v)+o(|v|).
\]
Reminder: \(o,O,\sim\) notation.
Theorem: Frechet differentiability implies Gateaux differentibility.
Not vice versa:
Example: \[f(x,y)=\frac{x^3y}{x^4+y^2}\]
Multivariate mean value:
If \(f\) is Gateaux differentiable,
\[
f(x+v)=f(x)+(f'(z),v)
\]
for some \(z\in(x,x+v)\)…
Gap between Gateaux and Frechet differentiability:
Theorem: if Gateaux differentiable function has continuous gradient, then it is Frechet differentiable.
Differentibility for
\[
F:\Real^n\to\Real^m
\]
carries over from scalar (\(m=1\)) case: component-wise. Derivative now is a linear function \(\Real^n\to\Real^m\); given (if we fix coordinates) as matrix.
Jacobian \(DF(x)\), an \(m\times n\) matrix.
Chain rule for (Frechet) differentiable functions:
If
\[
H=G\circ F (F:\Real^n\to\Real^m; G:\Real^m\to\Real^l; H:\Real^n\to\Real^l)
\]
then
\[
DH=DG\circ DF.
\]
Remark: Gateaux differentiability is not enough.
Mean value theorem also fails (example?). But one has inequalities:
\[
\parallel F(x+v)-F(x)- DF(x) v\parallel\leq L \parallel v\parallel
\]
where \(L\) is a bound on the operator norm od \(DF(z)-DF(x)\) over the segment \((x,x+v)\).
Iterating, one obtains higher derivatives of multivariate functions. They are essentially symmetric polylinear forms.
They are needed to formulate a multidimensional Taylor formula.
Taylor approximation: polynomial approximating the function to higher order,
\[
f(x+v)=f(x)+\sum_{|k|\leq K} \frac{1}{k_1!\cdots k_n!}\prod \frac{\partial^k f}{\partial^{k_1}x_1\cdots\partial^{k_n}x_n} v_1^{k_1}\cdots v_n^{k_n}+o(|v|^K).
\]
Examples:
Legendre (sometimes called Legendre-Fenchel) transform \(H\) of a convex function \(L:\Real\to\Real\)) is defined as
\[
H(p):=\sup_x px-L(x).
\]
Both functions, \(L\) and \(H\) can take value \(+\infty\).
If \(L\) is strictly convex, the gradient mapping
\[
G:x\mapsto \frac{dL}{dx}
\]
is one-to-one, and it is clear that the argmax in the definition of \(H\) solves \(p=G(x)\). Denote the functional inverse to \(G\) as \(F: p=G(x) \Leftrightarrow x=F(p)\). In this case,
\[
H(p)=pF(p)-L(F(p)).
Legendre (sometimes called Legendre-Fenchel) transform \(H\) of a convex function \(L:\Real\to\Real\)) is defined as
\[
H(p):=\sup_x px-L(x).
\]
Both functions, \(L\) and \(H\) can take value \(+\infty\).
If \(L\) is strictly convex, the gradient mapping
\[
G:x\mapsto \frac{dL}{dx}
\]
is one-to-one, and it is clear that the argmax in the definition of \(H\) solves \(p=G(x)\). Denote the functional inverse to \(G\) as \(F: p=G(x) \Leftrightarrow x=F(p)\). In this case,
\[
H(p)=pF(p)-L(F(p)).
\]
Differentiating, we obtain
\[
\frac{dH}{dp}=F(p)+\left(p-\frac{dL}{dx}\right)\frac{dF}{dp}=F(p)
\]
Hence the graph of the derivative of \(L\) with respect to \(x\), in the \((x,p)\) space is the same as the graph of the derivative of \(H\) with respect to \(p\). It follows immediately that the Legendre transform is involutive:
\[
L(x)=\sup_p px-H(p).
\]
If \(L\) is convex, one can turn it into a strictly convex function by adding \(\epsilon |x|^2/2\). The graph of the resulting gradient,
\[
p=G(x)+\epsilon x
\]
converges to the graph of \(p=G(x)\) as \(\epsilon \to 0\). So one can define the graph \(p=G(x)\) for all convex functions.
One can also define Legendre transform for nonconvex \(L\), using the same formula. The correspondence won’t involutive anymore (as Legendre-Fenchel transforms, as defined above, are always convex). Still, one can reproduce many results, in particular, that the derivative of \(H\) is discontinuous at \(p\) iff \(L-px\) has two competing global maxima, see figure below.
These considerations are relevant for the Homework 5, as in Problem a,
\[
V(x,0)=\inf_X \frac{(x-X)^2}{2T}+\cos(X)=\frac{x^2}{2T}-\frac{1}{T}
\sup \left(xX-\left(\frac{X^2}{2}+T\cos(X)\right)\right),
\]
i.e. is (somewhat modified) Legendre transform of the function \(X^2+T\cos X\).
For more, see chapter on Legendre transform in Arnold’s Mechanics, p 61ff.
Solution. As we discussed, the cost function for the problem above is given by
\[
V(x,0)=\min_X C(x,X,T), \mathrm{where} C(x,X,T)=\frac{(x-X)^2}{2T}+\cos(X)
\]
(here \(X=x(T)\) is the potential position of the end of the trajectory, and \((x-X)^2/2T\) is the minimal cost of reaching it from \(x\)).
Solution. As we discussed, the cost function for the problem above is given by
\[
V(x,0)=\min_X C(x,X,T), \mathrm{where} C(x,X,T)=\frac{(x-X)^2}{2T}+\cos(X)
\]
(here \(X=x(T)\) is the potential position of the end of the trajectory, and \((x-X)^2/2T\) is the minimal cost of reaching it from \(x\)). The cost function loses smoothness when \(C\) acquires non-unique minimum for some \(x\). This can happen only if \(C\) becomes nonconvex (in \(X\), i.e. when the second derivative
\[
\frac{d^2C}{dX^2}=\frac1T-\cos(X)
\]
vanishes somewhere. This happens at \(T=1\).
Solution. We are looking at
\[
V(0,0)=\min_X \frac{(X)^2}{2T}+\cos(X)
\]
For \(T\leq 1\) this is a convex function (of \(X\)), having vanishing derivative at \(0\), and the corresponding minimum attained at \(0\) is \(1\).
For \(T\gt 1\), there are two competing (global) minima, at point \(X(T)\) solving
\[
X(T)/T=\sin(X(T)), 0\lt X(T)\lt \pi
\]
(there is no closed form expression for it), and the resulting cost, as a function of \(T\), is shown below. Note that it converges to \(-1\) as \(T\to\infty\).
Solution:
One constructs, in the usual fashion, the Hamiltonian matrix
\[
H=\left(
\begin{array}{cccc}
-2&0&0&0\\0&1&0&-1\\-1&-1& 2& 0\\-1& -1& 0& -1
\end{array}
\right)
\]
Using computer algebra, one can find \(\exp(-T H)\), and apply it to the \(2n\times n\) matrix \(E;mE\), corresponding to the terminal value of \(X,Y\). Then one can find \(P=YX^{-1}\) – the solution of RDE at \(t=0\). The result is somewhat long:
To check, whether \P\) is bounded, it is enough to check that \(\det X\) does not vanish.
Again, using computer algebra we get
\[
\det X(t)=e^{(2 – \sqrt{2}) T} (2 + \sqrt{2} – \sqrt{2} m +
e^{2 \sqrt{2} T} (2 – \sqrt{2} + \sqrt{2} m))/4.
\]
We see that for \(m=0\), the determinant is positive for all \(T\), while for \(m=-6\) it is vanishing at
\[
T-t=\frac{1}{2\sqrt{2}}\log{\frac{2+7\sqrt{2}}{7\sqrt{2}-2}}\approx .14485…
\]
It is immediate that \(\det X\) vanishes for some \(T-t>0\) iff
\[
m=-\frac{\sqrt{2}+1-e^{2 \sqrt{2}(T-t)}(\sqrt{2}-1)}{e^{2 \sqrt{2}(T-t)}-1}.
\]
As \(e^{2 \sqrt{2}(T-t)}\) varies from \(1\) to \(\infty\), the values of \(m\) that could lead to vanishing of \(\det X\) lie in the interval \((-\infty,-\sqrt{2}+1)\). Thus the smallest \(m\) ensuring regularity of \(P\) is \(-\sqrt{2}+1\).