\(\def\Real{\mathbb{R}}\)

- For a smooth function \(f\) on \(\Real^n\), consider the gradient descent algorithm

\[

x_{n+1}=x_n-\eta\nabla f(x_n)

\]

(here \(\eta>0\) is a step size.It is known that for almost all starting positions, the iterations of the gradient descent converge to a local minimizer of \(f\). To ensure convergence for all initial data, it was suggested to perturb the trajectory once in a while, to escape potential trap in saddles, and shown that the procedure works, at least for non-degenerate critical points.What happens at the degenerate critical points?