What is Gradient Descent
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.
Quoted from
Take a simple instance
Now,We have a One-dimensional function
$$ f(x)=(x-1)^2-2 $$The graph may looks like thisHow using a traditional mathematical way to find the minimum of this function?
Of course,In this Instance you also could using graph find the minimum directly,But we don't talk about it.
We are going to get derivative of this function.
derivative:$$ \nabla f(x)= 2(x-1) $$Then let the derivative of this function equal 0.$$ 0=2(x-1) $$
We will find that when X equal = 1,we get the minimum of this function.
How does the Gradient Descent work
In this instance,One-dimensional function,The Gradient Descent optimization algorithm will change the X value continuously, reducing the value of function.
Let's take a simulation
We let the X start at value equal -1
$$ x_{0}= -1 $$
$$ \nabla f(x_{0})= 2(x_{0}-1) $$
How let the X value changing continuously, makes value of function getting closer to minimum.
Now we are going to focus on the derivative of this function
We know that when the derivative of function equal 0,will get a minimum or maximum of function
When the derivative of function > 0,value of function is going to increase continuously.When the derivative of function < 0,value of function is going to decrease continuously.let me give you a simple example
When our derivative of function gets closer to minimum or maximum, the value of derivative will smaller.
Then we can just let next value of X equal :
$$ x_{1}= x_{0}-\gamma*\nabla f(x_{0}) $$
$$ \gamma $$
The gamma is called learning rate ,controlling the speed of value changing, We will talk about it later.
How about multidimensional functions? That's a good question,Let's talk about it later.