Working with Data Sources
Data Manipulation Within Amazon Machine Learning
Working with Machine Learning Models
When we saw how incredibly popular our blog post on Amazon Machine Learning was, we asked data and code guru James Counts to create this fantastic in-depth introduction to the principles and practice of Amazon Machine Learning so we could completely satisfy the demand for ML guidance within AWS.
James has got the subject completely covered:
- What exactly machine learning can do
- Why and when you should use it
- Working with data sources
- Manipulating data within Amazon Machine Learning to ensure a successful model
- Working with machine learning models
- Generating accurate predictions
Welcome to our lecture on the machine learning problem types applicable to Amazon ML. Generally speaking, there are two classes of machine learning techniques: supervised and unsupervised learning. We'll talk about the difference between these two types of learning and then go over some example problems that you can run on Amazon ML. The difference between supervised and unsupervised learning is all about what you know.
In supervised learning, you will start with an authoritative data set. Each example in this set will be labeled with the correct answer to the question that you would like your ML to predict answers for. By providing data that already includes the correct answer, the machine learning algorithm can compare the model's predictions to the ground truth while training. If the prediction does not match the actual answer, then the algorithm can make adjustments on its next training pass.
On the other hand, unsupervised learning comes into play when you have data that do not have a correct answer. You may not even know the question yet. Unsupervised learning can find hidden structure in this type of data. And through studying these structures, we can draw conclusions or learn what the next question should be.
Amazon ML provides access to supervised machine learning algorithms. With supervised machine learning algorithms on Amazon, you can build solutions for three types of machine learning tasks: binary classification to predict one of two outcomes, multi-class classifications to predict among three or more outcomes, and finally, regression to predict a numeric value.
With binary classification, we put each data example under one of two categories. We can use this technique to answer questions such as: Will this customer buy a product or not? Is an E-mail spam or not? Is a particular product kitchenware or software? Is the current user human or not?
In multi-class classification, we deal with a similar problem to binary classification except now instead of only two categories, we have three or more. We can use multi-class classification to answer questions such as: Is this book a romance, thriller or adventure story? Is the person carrying a phone sitting, standing or walking? What type of car is more interesting to our customer? SUV, sports or economy?
With a regression problem, we compute a numeric value. We can use regression problems to answer questions like: What will the temperature be tomorrow? How many calendars will we sell next year? And what is the likely sale price for this home?
Depending on the data that you have or may have in the future and the question that you want to answer, you might be able to use Amazon Machine Learning. You need data where you already know the correct answer, and you need a question that can be answered by a classification or regression model.
James is most happy when creating or fixing code. He tries to learn more and stay up to date with recent industry developments.
James recently completed his Master’s Degree in Computer Science and enjoys attending or speaking at community events like CodeCamps or user groups.
He is also a regular contributor to the ApprovalTests.net open source projects, and is the author of the C++ and Perl ports of that library.