Training the Predictor Model


Features & Fundamentals of Amazon Forecast
Using Amazon Forecast
Start course

This course looks at the Amazon Forecast service, including what it does and how it works importing datasets, training predictors, and generating forecasts.

Learning Objectives

  • Learn the fundamentals of Amazon Forecast
  • Understand how to ingest data in Amazon Forecast
  • Learn how to train the predictor model
  • Learn how to create a forecast

Intended Audience

This course is intended for architects, developers, line of business managers, executives, and data scientists looking to improve their forecasting results in their business.


In order to get the most out of this course, you will need to meet the requirements for the AWS Cloud Practitioner Certification.


To train the predictor, you can use the Amazon forecast console to create one or more datasets, add them to a data set group and provide the data set group for training. Data set groups are containers for your datasets, predictors and forecasts. You can use dataset groups to organize your forecasts. These can be thought of as folders to organize your data for a particular forecast.

Data set groups require permissions from identity and access management to read your files from an S3 bucket. You can choose to use a pre-existing role, or you can create a new one using a built in control on the Forecast console. Once you give your data set group a name, the next step is to select the forecasting domain, representative of your data. There are some prefabricated domains to select from the console.

Retail is a domain to forecast demand for a retailer. Inventory planning is a domain to forecast demand for raw materials and determine how much inventory of a particular item to stock. EC2 capacity is a domain to forecast your Amazon EC2 capacity planning. Workforce is a domain to plan and identify the amount of workforce that you require. Web traffic is a domain to forecast web traffic to a web property or a set of web properties.

Metrics is a domain for forecasting metrics, such as revenue, sales and cashflow. Finally, Custom is a domain if none of the other domains are applicable and you can define your very own schema. After defining a dataset group and selecting the domain for your data, a data set needs to be created. Data set configuration details specify the name and frequency of your data, which is the frequency at which entries are registered in your data file. The data schema section specifies the attribute types for each column in your dataset. You can use the schema builder to graphically define your attribute names, types, and order in the console panel provided. You can also specify attribute name and attribute type in JavaScript object notation format. It's important to make sure the schema timestamp format matches your choice of timestamp format in the time series data set.

Also, after importing your data, you can check your data set import field statistics, where you can verify the number of expected items, the number of locations, time range, and other fields. Any errors with the import will also be recorded. When you create a predictor, you can create either choosing AutoML to let Amazon Forecast optimize the predictor for you or you can manually choose a forecast algorithm for your predictor.

Choosing an algorithm manually will require a bit of experience with forecasting and the algorithm selected. When you use AutoML, Forecast trains different models with your target time series, related time series and item metadata. It then uses the model with the best accuracy metrics. There are a few optional configuration options for training the predictor. By default, Amazon forecast creates an optimized predictor with the lowest average losses over the specified forecast types. You can also optimize your predictor with one of the following accuracy metrics by selecting it from the optimization metric configuration dropdown.

You get to see that they are Average Weighted Quantile Loss, which is the default choice, Weighted Absolute Percentage Error, Root mean Squared Error, Mean Absolute Percentage Error, and Mean Absolute Scaled Error. Again, these are configuration parameters that if you have experience, then you definitely can leverage your experience and select it specifically. Otherwise. AutoML will do the best selection for you.

To train the predictor, the item ID field defined in the schema is used by default. The forecast dimensions configuration allow you to select additional keys you will like to use to generate a forecast. This configuration is also optional. The number of back tests windows indicate the number of times the algorithm splits the input data for use in training and testing. You can choose between one and five BackTests. BackTest Window Offset defines the size of the testing data set used during training.

The Back Test Window Offset is the point from the end of the dataset where you want to split the data for model training and testing. The value is defined as the number of data points. As a general guideline, this Back Test Window Offset must be greater than or equal to the prediction window and less than half of the target time series data link. In the forecast type section, you specify the quantiles to create forecasts and evaluate predictors. You can choose up to five quantiles between 0.01 and -0.99, by increments of 0.01. You can also include the mean forecast using the keyword mean.

By default, Forecast evaluates predictors by averaging the weighted quantile losses. Of the 0.1 or P-10, 0.5 or P-50, and 0.9 or P-90 quantiles. At P-90 quantile level forecast means the actual value can be expected to be less than the forecast 90% of the time. In the predictor output, errors for each quantile forecast are called WQL or Weighted Quantile Loss, which is the error of that quantile forecast. Under the supplementary features of the predictor configuration, you can enable weather and holidays.

The Amazon forecast weather index incorporates past and projected weather information into your predictor. Forecast has over two years of historical weather information and 14 days of projected weather information. The feature can only be used in the United States, excluding Hawaii and Alaska. In order to use the weather index, it is important that you do the following:

  • Number one, include a geolocation attribute in your target time series
  • Number two, limit your time series data sets to data points after July 1st of 2018
  • Number three, set the endpoint of the forecast prediction to not exceed 14 days into the future
  • And finally, set the forecast frequency to either one minute, five minute, 10 minute, 15 minute, 30 minute, hourly or daily.

The holidays feature incorporates national holiday information into your predictor. Forecast supports 66 countries. You can choose a single country from the dropdown and holiday information for that country will be applied to every item in your dataset. With this, the predictor configuration is complete. You can initiate the process of training the predictor, and as a result, you're gonna get estimates for completion times through the console.

About the Author
Jorge Negrón
AWS Content Architect
Learning Paths

Experienced in architecture and delivery of cloud-based solutions, the development, and delivery of technical training, defining requirements, use cases, and validating architectures for results. Excellent leadership, communication, and presentation skills with attention to details. Hands-on administration/development experience with the ability to mentor and train current & emerging technologies, (Cloud, ML, IoT, Microservices, Big Data & Analytics).