Batch Normalization Continued
The course is part of this learning path
Move on from what you learned from studying the principles of recurrent neural networks, and how they can solve problems involving sequencing, with this cohesive course on Improving Performace. Learn to improve the performance of your neural networks by starting with learning curves that allow you to answer the right questions. This could be needing more data or, even, building a better model to improve your performance.
Further into the course, you will explore the fundamentals around bash normalization, drop-out, and regularization.
This course also touches on data augmentation and its ability to allow you to build new data from your starting training data, culminating in hyper-parameter optimization. This is a tool to that aids in helping you to decide how to tune the external parameters of your network.
This course is made up of 13 lectures and three accompanying exercises. This Cloud Academy course is in collaboration with Catalit.
- Learn how to improve the performance of your neural networks.
- Learn the skills necessary to make executive decisions when working with neural networks.
- It is recommended to complete the Introduction to Data and Machine Learning course before starting.
Hello and welcome back. In this video we're going to perform batch normalization on our model. So the first thing we're going to do is import the batch normalization class from the keras.layers. Keras implements batch normalization so all we need to do is insert it in the appropriate places. Then what we're going to do is define a function and this function repeats the training. So let's see. It takes as argument the x_train, y_train, x_test, y_test, the number of units we want to have in our layers, the type of activation, the optimizer we want and the question do batch normalization. It's a boolean yes no. The number of epochs and the repeats. And then what it does is it repeats the creation and training of a model for the number of repeats. We're gonna do this because this is kind of a in-house cross validation.
So for three repeats, for the default three repeats, what will happen? We will clear the session, delete the current denser flow class, create a new model, sequential, add a first fully connected layer, with the train input shape. Then, if do batch normalization is true, we also add a batch normalization layer, then we have a second layer and if do batch normalization is true, we add batch normalization after this layer and then a third layer.
All these layers have 500. All these three layers have 512 units each. They are initialized with the normal distribution random weight and have activation defined at input. So after the third layer we also add a batch normalization if the condition is true and finally we have our output layer with 10 nodes. We compile the model and we fit it on x_train, y_train, with validation data x_test, y_test for the number of epochs in input and with verbose equal zero. Then we take the history of training and we append the accuracy and the validation accuracy to our histories and repeats. Finally, we, after we've finished the loop, we convert the histories to an np.array and we calculate the mean and the standard deviation of the history. We return the mean accuracy, the standard deviation accuracy, the mean accuracy of the validation set, and the standard deviation accuracy of the validation set. So, why did we define this long function? The reason we did it is that now we can compare the training with and without batch normalization in a very convenient way.
We can use this function on x_train, y_train, x_test, y_test with do batch normalization equals false and we can use the same function with do batch normalization equal true. And these functions will take a little time to execute because they have to repeat the training three times so let's see what happens. Okay, training is finished. We've run ten epochs three times on the x_train, y_train, x_test, y_test and now we can define a little helper function that plots the mean and the standard deviation so it will plot the mean and then do a little shaded area around the mean with, between the mean plus one standard deviation and the mean minus one standard deviation. So, let's see what these curves look like. We have here the comparison of training and test score with and without batch normalization.
And I think the comparison is pretty significant. When we train with batch normalization, both our training score and our test score are up to almost 100 percent within five epochs, whereas when we don't do batch normalization, you see both our training and test score are very very low still. So this is how powerful batch normalization is. You can see, it speeds up the training and it also makes it more robust. So, I strongly encourage you to include batch normalization, it's a recent trick. It's really been published in the last few years and so let's take advantage of it in our models. Thank you for watching and see you in the next video.
About the Author
I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.