Working with Data Sources
Data Manipulation Within Amazon Machine Learning
Working with Machine Learning Models
When we saw how incredibly popular our blog post on Amazon Machine Learning was, we asked data and code guru James Counts to create this fantastic in-depth introduction to the principles and practice of Amazon Machine Learning so we could completely satisfy the demand for ML guidance within AWS.
James has got the subject completely covered:
- What exactly machine learning can do
- Why and when you should use it
- Working with data sources
- Manipulating data within Amazon Machine Learning to ensure a successful model
- Working with machine learning models
- Generating accurate predictions
Welcome to our last lecture on Amazon Machine Learning. In this lecture, we'll review the Machine Learning process. We've covered a lot of ground as we have progressed through the Machine Learning process with Amazon. Let's wrap up by reviewing how we got here. First, to create a Machine Learning solution in Amazon, you need to start with a question.
And after you have a question, you can go out and get some data. Hopefully, you have a specific problem in mind that you want to explore using ML. But if you don't have a specific problem and are just looking to learn, then you can find plenty of free data sets online. Good places to look for data include universities and government websites. Data is available on a variety of formats, but Amazon only accepts tabular data and CSP format stored in an S3 Bucket. For convenience, Amazon allows you to specify a Redshift or RDS Instance as a data source. But even with these data sources, Amazon still creates a CSP file behind the scenes. Once you have acquired or created a CSP file, you can create a Machine Learning data source and analyze your data using data insights. This is your chance to get to know your data and evaluate the data set for basic sanity.
When you are ready, you can use your data source to create an ML model and usually you want to create an ML model evaluation at the same time. At this stage, you can specify any additional feature processing or just the model parameters including the number of passes, regularization, and the model size. Once you are done setting up the model parameters and recipe, Amazon will automate some other ground work for you. Once you have requested a model, Amazon will split your data set into training and evaluation sets, perform any feature processing that you requested, and then train the model. If you opted for L1 regularization, Amazon will automatically drop features that do not contribute to the learning process. Once the model training is completed, Amazon will use the held out data to evaluate model performance. Then it is our job as developers to review the evaluation results and adjust prediction thresholds according to our needs. It may be that the performance is not yet acceptable, in which case we can adjust our data, features, our parameters, and try again.
Once the model reaches acceptable performance, we can use it to make predictions on unlabeled data. If we have many predictions to make at once, we can use the batch prediction option. If we have a need to make predictions in real time, we can enable the real time prediction API and get individual predictions in 100 milliseconds or less.
Thank you for watching our course on Amazon Machine Learning. I wish you the greatest success possible while building your Machine Learning solutions. I'm Jim Counts. Happy predicting.
James is most happy when creating or fixing code. He tries to learn more and stay up to date with recent industry developments.
James recently completed his Master’s Degree in Computer Science and enjoys attending or speaking at community events like CodeCamps or user groups.
He is also a regular contributor to the ApprovalTests.net open source projects, and is the author of the C++ and Perl ports of that library.