Working with Data Sources
Data Manipulation Within Amazon Machine Learning
Working with Machine Learning Models
When we saw how incredibly popular our blog post on Amazon Machine Learning was, we asked data and code guru James Counts to create this fantastic in-depth introduction to the principles and practice of Amazon Machine Learning so we could completely satisfy the demand for ML guidance within AWS.
James has got the subject completely covered:
- What exactly machine learning can do
- Why and when you should use it
- Working with data sources
- Manipulating data within Amazon Machine Learning to ensure a successful model
- Working with machine learning models
- Generating accurate predictions
Welcome to our lecture on formulating machine learning problems. In this lecture, we'll discuss the type of problems that are appropriate for machine learning, and consider an example problem, which we will work on throughout the remainder of this series.
Framing a machine learning problem for Amazon ML boils down to two considerations. First, you must determine what you have observed in the past.
Without data to process, you cannot find patterns that help make predictions, so you either need historical data or a plan to begin gathering data. Then you have to decide which you would like to predict. The data that you would like to predict is often called your label or target answer. You might want to predict how much insurance coverage a potential customer might buy. That would be a numeric target and require a regression model, or you might be interested in finding documents that contain personal information so that they can be redacted for public release.
You can think of this type of problem as grouping documents into two sets, and that makes it a binary classification problem. If you have more than two sets, for example, sorting books into genres, then you can still use ML to group the sets, but that would be a multiclass classification problem.
Throughout this series, we'll use Amazon ML to build a binary classification model, and use it to get predictions. The example data relates to a historical telemarketing campaign conducted by a Portuguese bank. This data is freely available, and we'll cover how to get it in the later course. For now, the important thing is to think about how this data fits into a business process. The bank would like to identify potential customers for a targeted marketing campaign.
The idea of targeted marketing is to reduce costs by only directing the offer to customers who are likely to accept the offer. So we would like to divide our potential customers into two sets: one set that we think are likely to accept the offer and everyone else. The offer, in this case, is a bank term deposit, also known as a CD. So we will provide a machine learning algorithm with information about our customers as well as what we know about the response to the previous marketing campaign. The algorithm will then build a model, which we hope will help us identify customers who will respond favorably to the current marketing campaign. The target value in this problem a simple yes or no answer to the question, will this customer be interested in what I'm selling?
James is most happy when creating or fixing code. He tries to learn more and stay up to date with recent industry developments.
James recently completed his Master’s Degree in Computer Science and enjoys attending or speaking at community events like CodeCamps or user groups.
He is also a regular contributor to the ApprovalTests.net open source projects, and is the author of the C++ and Perl ports of that library.