Optimizers Continued
Start course
1h 45m

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

  • Understand the importance of gradient descent and backpropagation
  • Be able to build your own neural network by the end of the course




Hey guys welcome back. In this video we're going to explore how to implement the different optimizers in Keras. So Keras offers objects for pretty much all the standard optimizers, just need to load them and distantiate them so we're gonna load the optimizers we've seen in class and we're going to loop through them. So these are the four things, the six things we're gonna try. Plain stochastic gradient descent. 

Stochastic gradient descent with the momentum of zero point three, the same thing but nesterov momentum, so the correction of nesterov. And then we're gonna try Adam, Adagrad and RMSprop. Notice that we've fixed the learning grade for all to be exactly the same. So we have these six optimizers and we're gonna loop over them and at each time we substantiate the model and evaluate the string here to actually create the optimizer. So the rest is the same, it's the same logistical regression model with four inputs and one output. You've set a batch size of 16 and we only run it for five epochs. 

We append all the results into the F list and then we concatenate the list, change the index like before this is nothing new, and plot the results. So what do we get? O.K. so Adam seems to be doing better than everybody and RMSprop, which is also an adaptive algorithm converges much faster. SGD, plain SGD, is the worst one, you see it's kind of converging but very slowly, and then you know, adding momentum and nestrov momentum seems to be helping convergence a little bit and Adagrad seems to be the same level of loss. O.K. It's performing a bit better but in general what we see is the three adaptive algorithms. They get to a much better accuracy in the end. So, just to make sure I'm going to rerun the cells, see if it's affected initialization. Shouldn't be. Let's see what we get. Yeah, SGD remains the worst, RMSprop and Adam keep being the best. 

This time SGD with momentum got a bit better and Adagrad and momentum stayed a bit lower but what's consistent is RMSprop and Adam are performing really well and SGD, plain SGD is the slowest in improving. So keep that in mind when you choose an optimization algorithm, choosing Adam or RMSprop could greatly improve your chances of quickly converting to a solution especially if your network is complex like a convolutional neural network or a recurrent neural network. You'll still have to choose the learning rate wisely, but yeah, with the same learning rate, these two algorithms seem to be perform much better than all the others. So keep that in mind, do your experiment, tell us what you've found and yeah, keep improving with this learning. Thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.