1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Building Convolutional Neural Networks on Google Cloud

Preventing Overfitting


Convolutional Neural Networks
Improving a Model
4m 50s
Start course

Once you know how to build and train neural networks using TensorFlow and Google Cloud Machine Learning Engine, what’s next? Before long, you’ll discover that prebuilt estimators and default configurations will only get you so far. To optimize your models, you may need to create your own estimators, try different techniques to reduce overfitting, and use custom clusters to train your models.

Convolutional Neural Networks (CNNs) are very good at certain tasks, especially recognizing objects in pictures and videos. In fact, they’re one of the technologies powering self-driving cars. In this course, you’ll follow hands-on examples to build a CNN, train it using a custom scale tier on Machine Learning Engine, and visualize its performance. You’ll also learn how to recognize overfitting and apply different methods to avoid it.

Learning Objectives

  • Build a Convolutional Neural Network in TensorFlow
  • Analyze a model’s training performance using TensorBoard
  • Identify cases of overfitting and apply techniques to prevent it
  • Scale a Cloud ML Engine job using a custom configuration

Intended Audience

  • Data professionals
  • People studying for the Google Certified Professional Data Engineer exam



The GitHub repository for this course is at https://github.com/cloudacademy/ml-engine-doing-more.


A common problem you often hear about in machine learning is overfitting. So what is it, exactly? Suppose you have these data points and you want to do a regression on them. If you were to do a linear regression, then it would look something like this. But suppose you tried to get a higher accuracy by doing a nonlinear regression and you ended up with this. This model would perfectly fit the training data, but if you were to run some new data points through the model, it would likely have lower accuracy than the linear model.


This is called overfitting and it’s something you always have to watch out for in machine learning. The easiest way to see if your model is overfitting is to look at the loss on the training data versus the loss on the evaluation data. If you get a really low loss on the the training data, but a much higher loss on the evaluation data, then your model is likely overfitting.


It might not seem like a very big difference between a loss of 0 and a loss of 0.086, but for this particular data, it actually is. For comparison, the linear model would have a loss of 0.018, which is far better. So when you’re building models, you want to strike a balance between not overfitting and not overgeneralizing.


When you look at the loss in TensorBoard, it’s usually pretty easy to see when you’re overfitting. Here’s an example. You can see that the loss on the training data got down to such a small number that it shows it in scientific notation, but the loss on the evaluation data was about .49. That’s a very big difference. Even without looking at the exact numbers, you can see that there’s an overfitting problem by how much higher the blue dot is than the orange dot.


So if you have an overfitting problem, what can you do about it? One of the most powerful techniques is to use more training data. In fact, the way I generated the overfitting example I just showed you was to feed only 500 of the MNIST images through the model. That’s just 1% of the 55,000 images we were using before. You can see what a difference that made. Unfortunately, it’s sometimes difficult to obtain more training data, especially because it needs to be labeled with the correct target for each instance.


Another technique is to use fewer features. If you take too many features into account in your model, especially if the data in some of the features is sparse, then you can end up with an overfitting problem. It takes trial and error to see which features help or hurt the model, though. A variation of this technique is already built-in to convolutional neural networks, because the pooling layers reduce the number of features.


One straightforward technique that can sometimes be effective is called early stopping. If you have a model where the loss on the training data is moving towards zero, but the loss on the test data is getting higher, then you have an overfitting problem that’s getting worse the more training steps you take. In this case, you should just stop the training at the point where the test loss starts getting worse. That way your model will be frozen at the point where its accuracy is the highest.


A mathematical solution is to use regularization. The idea is to force the model to generalize. It does this by keeping the weights close to zero whenever their values are adjusted. If the weights aren’t very big, then it’s kind of like having a straighter line in the graph I showed before.


There are two versions of regularization available in TensorFlow. L1 regularization tends to make weights stay at zero, so it effectively removes those features from the model. L2 regularization keeps the weights close to zero, but not necessarily at zero. You can actually apply both types of regularization to the same model, but give them different strengths. This technique definitely requires trial and error.


Dropout is a way of reducing overfitting that’s specific to neural networks. Technically, dropout is also a form of regularization, but it takes quite a different approach from L1 and L2 regularization. Here’s the idea. What if instead of having one neural net, you had lots of them, each with a different, smaller set of neurons than the original one? Then you could run training data through all of them and then average the predictions of all of them. That would prevent overfitting because even if one of the neural nets had an overfitting problem, it would be cancelled out when it was averaged with all of the other neural nets.


Now it would be too computationally expensive to actually use this many neural nets, so the dropout technique uses a very clever alternative. During every training stage, it randomly removes a number of neurons from the network. This effectively creates a smaller neural net that gets trained on a batch of data. On each training step, a different set of neurons is removed. When all of the training steps are finished, then all of the neurons are put back in the network for evaluation. This is essentially the same as averaging a large number of independent neural nets except that each one only sees a portion of the input data.


Luckily, TensorFlow has a function that does all the work. It’s called tf.layers.dropout. This puts a dropout layer in between other layers, so it’s a bit different from what I showed before, but it has the same effect. It ignores the input from half of the neurons in the previous layer. The most important parameters to set are: the input layer (that is, the name of the previous layer), the probability that a neuron will be dropped, and whether we’re in training mode or not (which is usually true). If we’re not in training mode, then we don’t want it to drop any neurons.


I’m not going to show you how this MNIST model performs when you add a dropout layer, because in this case, it actually results in slightly lower accuracy. Adding a dropout layer won’t automatically improve a model’s performance. It will just help with overfitting in some cases.


And that’s it for this lesson.

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).