Batch Vs Stochastic Vs Mini Batch gradient descent

November 19, 2020 · 3 min read

Subhrajyoti Halder

Batch gradient descent

Gradient descent is the core of machine learning and it is mainly is used while training the model. In deep learning when you give the input to the neural network it tries to predict the output using the input and the below equation

$y=W_1X_1+W_2X_2+b$

where W1 and W2 are the weights which are randomly selected initially or you can also set it and X1 and X2 are the two inputs and b is the bias

For example
Here X1 is previous salary and X2 is year of experience and y is the predicted salary


Neural Network Image Courtesy

In batch gradient descent we go through all the samples present in the data while training and keep on calculating the error at each and every sample of the data and later sum up all the errors here we are using log loss for calculating the error this whole process is called as one epoch. Now we have to update our weights so that our error become as minimum as possible.

$Wupdated = Wold - something$

$Wupdated = Wold - learningRate * \frac{\partial L}{\partial Wold}$

Here learning rate is very small value and it is used so that our derivative of the loss function stays in certain range.

We have to keep on updating the till we reach the global minima as shown below.


Gradient descent Image Courtesy

But when there is huge amount data batch gradient descent is computationally costly.
So, here Stochastic gradient comes to rescue.

Stochastic gradient descent

In stochastic gradient descent any one sample from the data is randomly picked and given to the neural network and output is predicted and loss is calculated comparing the actual and predicted outputs and the weight is updated using the same equation used in the batch gradient descent to calculate the updated weights.

So, the major difference between the batch gradient descent and stochastic gradient descent is that in batch gradient descent weights are updated after passing all the samples to the neural network but in stochastic gradient descent weights are updated after each sample. So, it reaches faster to the global minima when compared to batch gradient descent. Batch gradient descent is useful when there is less amount of data stochastic gradient can be used when there is huge amount of data.

Mini batch gradient descent

In mini batch gradient descent training is done in batches which mean k samples are randomly picked from the total data and given to the neural network and output is predicted and again loss in calculated and weights are updated same as gradient descent and stochastic gradient descent.

And batch gradient descent follows a straight path to reach the global minima where as Stochastic gradient descent and Mini batch gradient follows the zigzag path as shown below.


Batch gradient descent Vs Stochastic gradient descent Vs Mini batch gradient descent Image Courtesy

Batch gradient descent#

Stochastic gradient descent#

Mini batch gradient descent#

The END#

Batch gradient descent

Stochastic gradient descent

Mini batch gradient descent

The END