1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. Getting Started With Deep Learning: Working With Data: Gradient Descent


Developed with
Start course
Duration1h 45m


Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

  • Understand the importance of gradient descent and backpropagation
  • Be able to build your own neural network by the end of the course




Hey guys! Welcome back! In this video we're going to quickly explore the effect of weight initialization. So, when you train in neural network; in the beginning you need to assign values to the weights of your model and that's called initialization. And you could do many things, you could give them zeros values. You could draw random values from uniform distribution, which means random numbers between zero and one. Or you could do more complex things. Like maybe draw random weights from a gaussian distribution. Or from a gaussian distribution that is rescaled with a certain standard deviation that is depending on the number of nodes you have. Or the number of input units you have in your tensor. And same thing for the uniform. You could rescale in smart ways, instead of being between zero and one. 

So, we're going to experiment with all these. The way you do it is in your dense layer, there's a perimeter called kernel initializer. And this we're going to set it to init. Where init is one of the initializers we have here. All the rest is the same. It's four inputs. One output. Arc sigmoid. And we fixed the optimizer. We fixed the batch size. We fixed the number of epochs. Everything else stays the same. We do the multi index and then check what the results are. So, in terms of loss, the curves all seem to be the same. But some of them, have randomly got to a better accuracy. So, the zeros this time is performing better. The ones you would've expected to perform better, are actually performing the worst. 

So, let's see what happens if I just rerun the same cells. And you'll see that very likely, the results are going to change. Yes. So, you see now here everybody's performing better, except the yellow line. And this is proof of how important initialization of the weights actually is. Like if I change it, if I run it again, I'm going to obtain a different result, again. And see this time, the yellow line is the one that is performing the best. So, this means we are very sensitive to the way we initialize our weights. Especially, if we have a small number of features. Especially, if our data set is not that big, compared to the complexity of our network. So, yeah. If you're not sure if your model is kind of stuck, it does not improve. It may be a good idea to restart the training with a different initialization and see if by chance, you have just fallen into a local minima. 

A local minima trap. And just reinitialize the model, you just end up doing much better. So yeah, initialization is one of those things we never think about. 'Cause we tend to think of these models as deterministic. But it turns out, it actually plays a big role in whether or not the model is going to converge and how fast it's going to converge. So, just try it out. Try different initializations. And repeat with the same initializations a couple of times, to see how consistent your result is. Hope you had fun! In this video. Thank you for watching. And see you in the next video.

About the Author

Learning paths3

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.