Working with Data Sources
Data Manipulation Within Amazon Machine Learning
Working with Machine Learning Models
The course is part of this learning path
When we saw how incredibly popular our blog post on Amazon Machine Learning was, we asked data and code guru James Counts to create this fantastic in-depth introduction to the principles and practice of Amazon Machine Learning so we could completely satisfy the demand for ML guidance within AWS.
James has got the subject completely covered:
- What exactly machine learning can do
- Why and when you should use it
- Working with data sources
- Manipulating data within Amazon Machine Learning to ensure a successful model
- Working with machine learning models
- Generating accurate predictions
Welcome to our course on Machine Learning algorithms. In this lecture, we'll cover the difference between machine learning algorithms and machine learning models, the machine learning algorithms available in Amazon ML, and the training parameters you can adjust in Amazon ML.
Machine learning algorithms and machine learning models are not the same thing. The machine learning algorithm is the code that will consume labeled data. As it consumes the data, it will detect patterns that map the variables to the target. Once the algorithm learns the patterns in the data, it will output a model that represents those relationships. The model has code that consumes unlabeled data and uses those variables to create a prediction. Let's take a look at the type of algorithms available in Amazon ML.
Although there are many types of ML algorithms in the machine learning universe, Amazon Machine Learning only provides linear models. A linear model is specified as a linear combination of features. The purpose of the training process is to learn the proper weight for each input feature. Amazon provides us with a simple example of a linear model which combines age and income to estimate the amount of insurance that a customer will purchase. The weights, in this case, are 5 for the age and 0.0003 for the income.
Whatever the values of age and income are, they will be multiplied by these weights and combined with the rest of the variables in order to create the estimated target. To find these weights, the learning algorithm uses two parts: a loss function and an optimization technique. The loss function computes a penalty when the estimate provided by the ML model does not equal to ground truth value exactly. Amazon Machine Learning uses three loss functions, one for each type of supported model type: binary, multiclass, or regression. The optimization technique seeks to minimize the loss.
There is only one optimization technique used in Amazon ML. It is called Stochastic Gradient Descent or SGD. SGD makes sequential passes of the training data. On each pass, it varies the feature weights for one example, trying to minimize the loss. As mentioned, there is one loss function per model type. This loss function is combined with SGD to create the machine learning algorithm. For binary classification, Amazon ML combines SGD with the logistic loss function. For multiclass classification, Amazon combines SGD with the multinomial logistic loss function.
For regression, Amazon combines SGD with the squared loss function. Learning algorithms accept parameters that allow you to control the quality of the resulting model. The following parameters are commonly used to tune linear algorithms. Regularization helps prevent a common problem called overfitting. It does this by penalizing extreme weight values.
You can choose among three regularization types. First, you can have no regularization, or you can choose between L1 or L2. L1 works by reducing the number of features considered by the model. Mechanically, it does this by reducing low weighted values to zero. L2 works by reducing the size of the weights overall.
In addition for letting you choose the type of regularization, Amazon also gives you options for the amount of regularization for the type that you choose. You cannot use L1 and L2 together with Amazon ML. By default, L2 is used with the amount of 0.000001. The amount specified is a double-precision floating-point number.
You can choose any amount between zero and the maximum double value, or you can switch to L1 and likewise chose a number in the same range. The number of passes parameter influences the SGD behavior. This parameter controls the number of passes that the algorithm makes over the training data. There is a diminishing return to adding more passes, but having a large number of passes over a small dataset can allow the algorithm to fit the data more closely. For an extremely large dataset, a single pass might provide enough information to create a good model.
Amazon allows you to set the max number of passes between 1 and 100. Controlling the model size allows you to choose between the predictive quality and cost to use. Smaller models may miss more patterns if they cannot fit within the allotted size. However, larger models take longer to train inquiry for predictions. Remember that model size is not about the size of your input data but about the amount of space required to describe patterns found in the data. Amazon allows models to range in size between 100 kilobytes and 2 gigabytes. Remember that this is the max size. Amazon will only use as much space as required up until the max.
James is most happy when creating or fixing code. He tries to learn more and stay up to date with recent industry developments.
James recently completed his Master’s Degree in Computer Science and enjoys attending or speaking at community events like CodeCamps or user groups.
He is also a regular contributor to the ApprovalTests.net open source projects, and is the author of the C++ and Perl ports of that library.