Gradient Descent Algorithms

(Work In Progress, I will update this note when I have time.)

In this note, I will introduce the mathematical foundations of gradient descent algorithms, with a focus on analyzing the convergence rate of these algorithms for convex optimization problems.

Basically, I will go through the relevant definitions, theorems, and proofs step by step. I hope this note can help me (and you) better understand the mathematical foundations of gradient descent algorithms.

The note is organized as follows:

I. Preliminaries (Page 2)
- 1.1 Directional Derivative
- 1.2 Mean Value Theorem
- 1.3 Taylor's Theorem
- 1.4 Eigenvalues Properties
- 1.5 Vector Norm and Matrix Norm
- 1.6 Loewner Order
II. Convexity (Page 3)
- 2.1 Definition
- 2.2 Strong Convexity
- 2.3 Properties
III. L-smoothness (Page 3)
- 3.1 Definition
- 3.2 Properties
IV. Algorithms (Page 4)
- 4.1 Gradient Descent
- 4.2 Stochastic Gradient Descent
- 4.3 Mini-batch Stochastic Gradient Descent
- 4.4 Stochastic Gradient Descent with Momentum
  - 4.4.1 Polyak's Momentum
  - 4.4.2 Nesterov's Momentum

Gradient Descent Algorithms

Gradient Descent Algorithms

Gradient Descent Algorithms

I. Preliminaries

1.1 Directional Derivative

1.2 Mean Value Theorem

1.3 Taylor's Theorem

1.4 Eigenvalues Properties

1.5 Vector Norm and Matrix Norm

1.6 Loewner Order

References

Gradient Descent Algorithms

II. Convexity

2.1 Definition

2.2 Strong Convexity

2.3 Properties

References

Gradient Descent Algorithms

III. L-smoothness

3.1 Definition

3.2 Properties

References

Gradient Descent Algorithms

IV. Algorithms

4.1 Gradient Descent

4.2 Stochastic Gradient Descent

4.3 Mini-batch Stochastic Gradient Descent

4.4 Stochastic Gradient Descent with Momentum

4.4.1 Polyak's Momentum

4.4.2 Nesterov's Momentum

References

LEON