Gradient Descent Algorithms
In this note, I will introduce the mathematical foundations of gradient descent algorithms, with a focus on analyzing the convergence rate of these algorithms for convex optimization problems.
Basically, I will walk you through the relevant definitions, theorems, and proofs step by step. I hope this note can help you (and me) better understand the mathematical foundations of gradient descent algorithms.
The note is organized as follows:
- I. Preliminaries (Page 2)
- 1.1 Directional Derivative
- 1.2 Mean Value Theorem
- 1.3 Taylor's Theorem
- 1.4 Eigenvalues Properties
- 1.5 Vector Norm and Matrix Norm
- 1.6 Loewner Order
- II. Convexity (Page 3)
- 2.1 Definition
- 2.2 Strong Convexity
- 2.3 Properties
- III. L-smoothness (Page 3)
- 3.1 Definition
- 3.2 Properties
- IV. Algorithms (Page 4)
- 4.1 Gradient Descent
- 4.2 Stochastic Gradient Descent
- 4.3 Mini-batch Stochastic Gradient Descent
- 4.4 Stochastic Gradient Descent with Momentum
- 4.4.1 Polyak's Momentum
- 4.4.2 Nesterov's Momentum
Note to Readers
This note is still under construction. I will keep updating it. If you find any mistakes or have any suggestions, please feel free to contact me. Thank you!
Hope you aren't scared by the math symbols. 😅