Gradient Descent :
Gradient descent is a very general optimization algorithm capable of finding optimal solutions to a wide range of problems. The general idea of Gradient Descent is to tweak parameters iteratively in order to minimize a cost function.
It measures the local gradient of error function with regards to the parameter vector 𝛉, and it goes in the direction of descending gradient. Once the gradient is zero, you have reached a minimum!
Concretely, you start by filling 𝛉 with random values ( this is called random initialization ) and then you improve it gradually, taking one baby step at a time, each step is attempting to decrease the cost function( e.g. MSE ) until the algorithms converges to a minimum.
An important parameter in Gradient Descent is the size of the steps, determined by
the learning rate hyper-parameter. If the learning rate is too small, then the algorithm
will have to go through many iterations to converge, which will take a long time.
On the other hand, if the learning rate is too high, you might jump across the valley
and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
solution.
Comments
Post a Comment