Acceleration in Fixed Point Iteration and Nesterov;s Gradient Method

Open as TemplateView PDF

Author:

Jinxiong Zhang

Last Updated:

6 years ago

License:

Creative Commons CC BY 4.0

Abstract:

A note on gradient-based optimization.

Tags:

Acceleration in Fixed Point Iteration and Nesterov;s Gradient Method

\begin{now}

Discover why over 20 million people worldwide trust Overleaf with their work.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{url}
\usepackage{hyperref}
\title{Acceleration}
\author{Jinxiong Zhang }
\date{March 2019}

\begin{document}

\maketitle

\section{Introduction}
It is popular to use Nesterov's accelerated gradient method to minimize the cost function $f(x)$ in machine learning specially deep learning, where it is called momentum method. The source of its acceleration is studied from many perspectives such as the blogs at \url{https://blogs.princeton.edu/imabandit/2015/06/30/revisiting-nesterovs-acceleration/}.
If Anderson acceleration for fixed point iteration is applied to the problem
$$x=x-\alpha\nabla f(x)$$
where $\alpha$ is a scalar function, it is still an open problem whether it is equivalent to Nesterov's accelerated gradient method as far as known.

It is worthy of exploring it.
\section{Nesterov's accelerated gradient method}
The general form of Nesterov's accelerated gradient method to minimize the cost function $f(x)$ can be written in the following form:
$$x^{k+1}=x^k-\gamma \nabla f(x^k+\mu(x^k-x^{k-1}))+\mu(x^k-x^{k-1}).$$

The parameters $\gamma$ and $\mu$ are difficult to tune when $f(x)$ is non-convex.
\section{Anderson acceleration}
We apply Anderson acceleration to the problem $x=x-\alpha\nabla f(x)$:
$$x^{k+1}=x^k-\alpha \nabla f(x^k)+(1-\alpha_1)\alpha[\nabla f(x^k)-\nabla f(x^{k-1})]+(1-\alpha_1)(x^{k-1}-x^k)$$
where $\alpha_1=\arg\min_{\alpha_1}{\|\alpha_1\nabla f(x^k)+(1-\alpha_1) \nabla f(x^{k-1})\|}_2^2=-\frac{\|\nabla f(x^k)-\nabla f(x^{k-1})\|_2^2}{\left<\nabla f(x^k)-\nabla f(x^{k-1}), \nabla f(x^{k-1})\right>}$.

There is a difference of previous iteration in Anderson acceleration as well as Nesterov's accelerated gradient method.
It seems that 
$$\gamma \nabla f(x^k+\mu(x^k-x^{k-1}))\approx (1-\alpha_1)\alpha[\nabla f(x^k)-\nabla f(x^{k-1})].$$

And what is the connection of these two methods in mathematics?

\end{document}

Acceleration in Fixed Point Iteration and Nesterov;s Gradient Method

Ta kontakt