Learning Rate Continued
Start course
1h 45m

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

  • Understand the importance of gradient descent and backpropagation
  • Be able to build your own neural network by the end of the course




Hey guys, welcome back. In this video, we're going to build a model to distinguish between fake and valued bank notes. It's an amazing data set, really interesting. So I'm really excited to to be doing this. See, like this true bank note and the fake bank note. And we have the data set of bank notes, and we're gonna be building it classified. So we load the data set and check it, and we essentially have four numerical variables that are essentially statistics of some higher level features extracted from transforms of the bank note scans. So, don't worry too much about what these numbers mean. All we know is we have four numerical features and a class. 

And we have 762 valid bank notes, and 610 fake bank notes. Let's display the data set using a pairplot in Seaborn, and we are going to keep the color to the class variable and let's look at how these four variables-- And we can see that the two populations of bank notes are actually pretty different in most of the features, so we should be totally able to separate them, to a certain extent, particularly good extent. And yeah, so let's build a baseline model. We are going to build our baseline model with a random RandomForestClassifier. 

We are gonna scale all the features to mean of zero and sum aggregation of one. We can do that with a pre processing scale function. This is like the standard scaler but it's the same package. It doesn't fit our data, just scales them. So we build a RandomForestClassifier. We see that it's 99.3% accurate on a three-fold cross validation. Cool, so yeah, the data is separable. Let's see what logistic regression does on the-- We're doing a train test split. By now we should be very familiar with this. And we import Keras. Notice that this time we are importing the back end, too, and we do that because we are gonna be building several models, and so I prefer to clear the models from memory. So this first one is not really necessary. So we could avoid doing that. Basically, we build a model. 

This is a logistic regression, so it's not a deep model. Sigmoid activation one, I put node for input binary crossentropy, retrain the model 10 epochs, loss goes down, accuracy goes up, and one thing that I've done differently from before, is I've saved the result of the fitting onto a variable called history. The reason I did that, is I want to display the history and see how my model is doing, so the accuracy and the loss over the number of epochs. Again, basically put the data of the history in a data frame with the index being the epochs, and then plot that data frame inserting ylim. 

So the title is test accuracy: 57.9%. Loss is going down to about .5 and accuracy went up to point 60 something, and that's the test accuracy. So, results are not great in 10 epochs. Probably if I train it more, it's gonna get better, but in this video, we are going to explore the effect of things like changing the learning grades, changing the batch sides. So let's go with the learning grade first. So what I'm gonna do, and while I talk and explain I'll let it run. I'm going to create a list, empty list, and test a few different values for the learning rate, so .01, .05, .1, and .5. And then for each learning grade, what I do is I will clear the backend session, build a new model. 

So, model is exactly the same. It's logistic regression, and then initialize the optimizer, the Stochastic Gradient Descent optimizer with learning rate that is equal to the learning rate of one of the values in this. Alright, so I train the model, fix batch size of 16, verbose equals zero, epochs is default value of 10. Okay, and then the history, I append the data frame of the history to this empty dflist that I've created. So, once I've gone through my list, my dflist should contain four data frames with the histories of my trainings. Then what I do is I concat, concatenate, these four data frames into a thing called historydf, which looks like this. So these are the four data frames. Each has two columns, accuracy and loss, accuracy and loss, accuracy and loss, accuracy and loss, and you can already see that, depending on the learning rate, in some cases, the accuracy is converging way faster. 

So, we are going to display this data, and to do that, we are going to just change the column names of our history data frame, to be a multi-index. I'll show you in a second what this means. All I've done here is just rename the column name so that I have the learning rate, also. So this is called a multi index, because I have a major and a minor index in the columns, and yeah, it's done with a Pandas MultiIndex from product. Now I can generate these plots. Subplot number one is gonna be the loss for all the different learning rates, and subplot number two is gonna be the accuracy for the different learning rates. So we generate this, and we have these two very nice plots, where we see that, essentially, increasing the learning rate, will make us converge faster. Notice like the loss for lower learning rate is, it's decreasing, but it's kind of slowly decreasing, and the more we increase the learning rate, the faster we kind of converge to small loss and high accuracy. So, we had seen this in the TensorFlow playground. 

I just wanted to show you that you can totally do that in Keras, play around with learning rate, of your optimization algorithm, and learning rate is interesting parameter, because it's pretty much a trial and error, that thing. You don't want it too high. That would make your model unstable. You don't want it too low, because it will make your convergence too slow, and that's why we will see in the next lectures, there are some algorithms that are adaptive, and they can adapt their learning rate. So we'll learn those. Hope you had fun in this exercise, and I'll see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.