Convolutional Neural Networks
Improving a Model
The course is part of these learning paths
Once you know how to build and train neural networks using TensorFlow and Google Cloud Machine Learning Engine, what’s next? Before long, you’ll discover that prebuilt estimators and default configurations will only get you so far. To optimize your models, you may need to create your own estimators, try different techniques to reduce overfitting, and use custom clusters to train your models.
Convolutional Neural Networks (CNNs) are very good at certain tasks, especially recognizing objects in pictures and videos. In fact, they’re one of the technologies powering self-driving cars. In this course, you’ll follow hands-on examples to build a CNN, train it using a custom scale tier on Machine Learning Engine, and visualize its performance. You’ll also learn how to recognize overfitting and apply different methods to avoid it.
- Build a Convolutional Neural Network in TensorFlow
- Analyze a model’s training performance using TensorBoard
- Identify cases of overfitting and apply techniques to prevent it
- Scale a Cloud ML Engine job using a custom configuration
- Data professionals
- People studying for the Google Certified Professional Data Engineer exam
- Introduction to Google Cloud Machine Learning Engine course
- Google Cloud Platform account recommended (sign up for free trial at https://cloud.google.com/free if you don’t have an account)
This Course Includes
- Many hands-on demos
The github repository for this course is at https://github.com/cloudacademy/ml-engine-doing-more.
As you’ve seen, there’s quite a bit of information in the logs (and you can even get your jobs to print more), but it’s not easy to visualize what’s happening with your jobs. If you want to tweak your model to improve its performance, you need to have a better way of seeing how it’s doing. That’s where TensorBoard comes in.
TensorBoard graphically displays various aspects of your training runs. You can use it both when a job is running and after it’s done. Let’s have a look at the training run we did locally.
Starting it is very simple. Type “tensorboard --logdir=” and the path to the model directory. For the local script we ran, the model directory was /tmp/mnist_convnet_model. Also add an ampersand to put this process in the background. You don’t have to do that, but we’re going to run another command from this shell in a few minutes, so that’ll make it easier.
It takes a few seconds to start and then it gives you a URL to use. Paste that into your browser. If you click on accuracy, it doesn’t show much. There’s just one data point showing the accuracy rate at the end of the training run, which we already knew. It only shows that one data point because that’s the only time we measured the accuracy in the script. It’s possible to modify the code so it’ll record the accuracy along the way as well, but instead, we’ll look at the loss.
You can make it bigger by clicking here. Notice that there are two curves. By default, TensorBoard applies a smoothing function to the curves in its graphs. If a curve has lots of data points, then the smoothing works fine, but if you only have a few data points, then it can produce misleading curves. If you turn off smoothing, then you’ll only see the real curve.
For this run, the loss started off at about 2.3 and then gradually dropped to 2.051 at step 900. The blue dot shows the result from the evaluation at the end of the run, that is, at step 1000. It’s often either above or below the curve because its loss value is calculated against the evaluation data rather than the training data. If it’s significantly above the curve, then you could have an overfitting problem, which I’ll go over later. It’s below the curve, so the model’s probably okay.
This run only had 1,000 steps, so let’s see what happened when we ran it with 20,000 steps on ML Engine. We can leave this instance of TensorBoard running and start another one, but we’ll have to tell it to use a different port or it’ll get an error. Then we need to give it the model directory for the ML Engine job. I’m assuming that you still have these environment variables set. If not, then you’ll have to type the Cloud Storage path manually.
The loss dropped for the first 2,000 steps, but then the curve gets pretty bumpy. The model was struggling to reduce the loss further. After about step 7000, it doesn’t really get much better, if at all.
You might have noticed that the blue dot, which represents the loss for the evaluation step, is much higher than the orange curve. That suggests a potential problem. However, the light blue line is in a much different spot. The smoothing function strikes again. It works pretty well on the orange curve this time because it has lots of data points, but since the blue curve only has two data points, the smoothing doesn’t work well at all. If we turn it off, the blue dot is actually a bit below the orange curve, so it’s not a concern.
To improve the model, you could take many different approaches. One approach would be to change the number of convolutional and pooling layers, the number of filters, etc. However, there are some simpler changes that can make a big difference.
The learning rate is a good place to start. Here’s where it’s set. So what is the learning rate?
First, you need to understand what an optimizer does. As you know, after a batch of training data goes through the model, the average difference between the predictions and the correct answers is calculated. Then the program adjusts the weights in an attempt to reduce the loss on the next batch of data. So how does it know which way to adjust the weights?
Here’s a visualization of a loss function with respect to just two weights. In our MNIST model, we have thousands of weights, but that’s pretty hard to visualize, so we’ll stick with this. At the beginning of a training run, the program sets the weights randomly. Then after running a batch of data through the model, the loss function calculates the average loss. That’s represented by a point on this graph. Using a bit of calculus, the program then calculates the gradient. This tells you which direction from that point has the steepest slope. Since we want to reduce the loss, it then adjusts the weights to move down that slope. This method of optimization is called gradient descent.
It’s a brilliant way to make a model’s predictions better, isn’t it? There’s just one catch. You have to tell it how far it should move down the slope on each step. That’s called the learning rate. In the script, the learning rate is set to .001, which means it will move one one-thousandth of the way down the gradient each time. You might be wondering why it’s set to such a small value. After all, if you set it to .01, it would learn 10 times faster. The problem with that is we’re trying to find the local minimum in the function, so if we make the learning rate too big, it’ll likely overshoot the local minimum.
So you can try setting the learning rate to different values to see if it’ll give you a higher accuracy, but there’s a better way. TensorFlow includes a pre-built optimizer called the AdamOptimizer that keeps separate learning rates for each weight, which is pretty mind-boggling. To use it, all you have to do is replace the GradientDescentOptimizer with AdamOptimizer. Note that you still need to set an overall learning rate for it, so you can leave that as is.
You can stop the second TensorBoard process by hitting Control-C. The first process will still be running because we put it in the background.
Since the script deletes the model directory every time, the results from our last run will be deleted and replaced by the results from this run. If you want to save them, then just rename the /tmp/mnist_convnet_model directory. This time, let’s watch its progress in TensorBoard as it’s running.
You’ll probably have to wait until it’s on at least step 200 before anything will show up in TensorBoard, and then you’ll also might need to refresh the browser, so I’ve fast forwarded.
Check out how quickly the loss dropped. It went from 2.3 down to .12 in only a hundred steps. With GradientDescentOptimizer, it didn’t even come close to dropping that far after 1,000 steps! I’ll fast forward to when it’s done.
OK, the accuracy shot up to almost 99%. That’s a pretty amazing improvement considering we only changed the optimizer and not the model itself. Best of all, the model now trains in a fraction of the time it took before.
And that’s it for this lesson.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).