1. Home
2. Training Library
3. Big Data
4. Courses
5. Getting Started With Deep Learning: Working With Data: Gradient Descent

1
Introduction
PREVIEW1m 22s
3
4
12
13
EWMA
4m 12s
14
17
18
20
22
24
26

## The course is part of this learning path

Start course
Overview
DifficultyBeginner
Duration1h 45m
Students205
Ratings
5/5

### Description

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

• Understand the importance of gradient descent and backpropagation
• Be able to build your own neural network by the end of the course

Prerequisites

### Transcript

Hello, and welcome to this video on gradient descent. In this video, we will learn about different types of gradient descent, and we will learn about the concept of batch, and, in particular, the fact that we need to choose a batch size. How do back propagation and gradient descent work in practice in deep learning? As we've seen, the gradient is calculated from the cost function evaluated on the training data. x and y here indicate a pair of training features and labels. In principle, we could feed the training data one point at a time to the cost function and, for each pair of features and labels, calculate the cost and the gradient and update the weights accordingly. So, one point goes in, we do forward propagation, back propagation, and update the weights. This is called stochastic gradient descent.

Once our model has seen each training data once, we say that an epoch has completed, and we start again from the first training pair with the following epoch. Stochastic gradient descent is a very noisy estimation of the gradient, because a single training data point is used to estimate the gradient. You can improve it by averaging the gradients over the training data before we update the weights. This is how normal gradient descent works. In normal or batch gradient descent, we first calculate the gradient for all training pairs, and then, we average the gradients to update the weights.

While more accurate, this method is also not optimal, since a single update requires calculating the gradients for all the training data, and we basically end up doing one weight update per epoch. A compromise solution is called mini-batch, or mini-batch gradient descent. In this case, we will still average the gradient calculation, but only over a small number of points taken from a sample of the training set.

It's common to take a power of two, so, for example, you could take 16 points, 32 points, 64 points, et cetera. This method gives us the best of both approaches. By averaging a few points, we get a better estimation of the gradient, a less noisy estimation of the gradient, but we also do many updates per epoch, speeding up training in this way. In conclusion, in this video, we've seen that there are three ways of doing gradient descent: normal gradient descent, stochastic gradient descent, and mini-batch gradient descent. For mini-batch, we've learned that there is a choice to be made which the size of the batch we are using for each weight update. Thank you for watching and see you in the next video.