1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Introduction to the Principles and Practice of Amazon Machine Learning

Evaluate Model


Problem types
Working with Data Sources
Data Manipulation Within Amazon Machine Learning
Working with Machine Learning Models
Start course
2h 12m

When we saw how incredibly popular our blog post on Amazon Machine Learning was, we asked data and code guru James Counts to create this fantastic in-depth introduction to the principles and practice of Amazon Machine Learning so we could completely satisfy the demand for ML guidance within AWS.

If you've got a real-world need to apply predictive analysis to large data sources - for fraud detection or customer churn analysis, perhaps - then this course has everything you'll need to know to get you going.

James has got the subject completely covered:
  • What exactly machine learning can do
  • Why and when you should use it
  • Working with data sources
  • Manipulating data within Amazon Machine Learning to ensure a successful model
  • Working with machine learning models
  • Generating accurate predictions

When our model is created and the evaluation finishes, Amazon provides ML model insights. We can access them by clicking on the model's name in the dashboard. The type of insights you will see will vary based on the type of model you are evaluating. So remember that everything we see pertains to the binary model. The most interesting thing to look at is going to be the results of the last evaluation and we can access that by clicking on the overall score result here.

The evaluation summary screen shows us that our overall quality score of 0.82 is considered very good. The score is labeled AUC, which stands for area under the curve. AUC measures the model's ability to predict the highest score for positive examples. In this case, that being said, our model predicts a high score for Yes responses to the banking promotion. A binary model works by scoring the variables and then making a prediction based of a cutoff value. Down the left side menu, we can see that there are some alerts below the summary. We can see that our evaluation meets several of these criteria. We did not use the same data for training as testing. We used sufficient data to evaluate the model and the schema matched, which makes sense, so this one came from the same data source. The only alert we have received in our evaluation is that the target distributions are not the same. The target distribution for the evaluation data source is different than the target distribution from the training data source. We can see this in detail by clicking on the small graphics. There's our target distribution for the training set and here's our target distribution for the evaluation set.

And we can see that in the evaluation set, the true historically observed number of yeses is much higher than it was in the evaluation set. As much as possible, you want the test data to have the same statistical characteristics as the training data. So if you do receive an alert like this or any other alert, you can take steps to correct the problem. In this case, you could request a new evaluation, this time with a differently selected test set. For now we'll move on to the ML model performance section, just label explore performance on a lot. This section is interesting because it allows us to adjust the prediction cutoff threshold and see how that affects our outcomes. A binary classifier works by scoring the variables and then making this decision based on this cutoff value. Any score above the cutoff value is predicted to be a 1 or in our case a Yes. Any prediction below the cutoff value, in this case 0.5, is predicted to be someone who will not accept the banking promotion, a 0 or a No. We can see the consequences of this default threshold along the right hand side and below the chart. By moving the cutoff threshold throughout the chart, we can dynamically change it and see how it changes our outcomes. It will take us quite a while of moving it before we see some change in the correctness and the errors Or if we go in the other direction, again we see lots of numbers changing, but our percentages are holding pretty steady until we get to the edges.

We're set at 2.5. Another option for us would be to use the advanced metric section to increase or decrease a particular quality of the model. So if we wanted the msot accurate model, we could increase the accurate slider over to the one on the right and Amazon ML automatically adjusts threshold for us to accommodate our selection in the advanced section. But we can see that there are tradeoffs between these different metrics.

Some cannot be increased or decreased without directly affecting the others. And all of them, of course, affect the cutoff threshold. Once again, I'll reset the threshold to the default of 0.5. And I shouldn't need to say this, but if you did choose a different threshold, you could make that your new threshold by pressing the blue button. So now we've created our first ML model and evaluated it. Let's return back to the ML dashboard and see everything that we have created so far.

You can see that so far we've created a total of five different entities starting with our data source. Remember that our data source represents a connection of the source data and statistical meta data about the data. The next two data sources are simply the same data repeated except this time it's been split between the 70% bucket to train the model and the 30% bucket to test the model. As we saw from our alert, we might want to revisit these buckets in order to get a more close statistical match between the training set and the testing set. Next we have the model itself and we will have to regenerate that model if we did go back and change our data. But whether we do that or not, the model is what we would actually use to make predictions in the future.

And closer related to the model is the evaluation of the model. These are the results of the test against the 30% of held out data.we use this evaluation to analyze the model's performance and make decisions about what cutoff value is going to be best for a particular application.

We can make as many evaluations as we have data for, but we should not use the same data to test the models or use to create the model. And it's also critical to run at least one evaluation so that we know from our model performance well enough to be used to make predictions.

About the Author
James Counts
Software Developer

James is most happy when creating or fixing code. He tries to learn more and stay up to date with recent industry developments.

James recently completed his Master’s Degree in Computer Science and enjoys attending or speaking at community events like CodeCamps or user groups.

He is also a regular contributor to the ApprovalTests.net open source projects, and is the author of the C++ and Perl ports of that library.