Gradient Descent Continued
The course is part of this learning path
Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.
From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.
- Understand the importance of gradient descent and backpropagation
- Be able to build your own neural network by the end of the course
- It is recommended to complete the Introduction to Data and Machine Learning course before starting.
Hey guys, welcome back. In this video, we're going to explore what happens when you change the batch size of your training data. I've pretty much copied the same code I had above for the learning rate, the only thing I've changed is now the loop is not over learning rates, learning rate is kept fixed, I'm looping over different batch sizes. So I have four different batch sizes and when I do the model fit here, I set batch_size equal batch_size. All the rest is kept fixed. sgd we'd fixed.
Default learning rate, the model has not changed, notice that I'm clearing the session at each iteration, so the new model is created, and pretty much I initialize the model, we run it, and then it's the same drill as before. Create a history data frame, and plot what happens for different batch sizes. And notice what's going on. So, huge batch size, slow convergence. Small batch size, fast convergence. Intermediate batch size, kind of started well and didn't move that much.
Okay, so since this may have been like a random effect of how we started, I'm going to re-run it second time and see what's next time this time's gonna be. So we see some consistency, like big batch size slow convergency, small batch size converging faster, and intermediate batch sizes, starting well and staying there. So yeah, the long story short is a small batch size is probably going to help you make faster moves and converge faster.
Too small a batch size is probably gonna be too noisy. So my advice would be to keep batch size kind of like 16 or 32. Those are good numbers, and in fact the default batch size in the fit function is 32. So yeah, don't use huge batch sizes, it's probably not the best idea to converge fast. This will depend on many things. This will depend on how big is your data set too. So yeah, you will have to kind of try different batch sizes and see which one works well, works better for your data. The good news is, you should see this effect in the first few box, and so try different values when you find one that seems to be moving fast, stick with that and let it run for longer. Thank you for watching, and see you in the next video.
About the Author
I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.