Ingesting Data


Features & Fundamentals of Amazon Forecast
Using Amazon Forecast
Start course

This course looks at the Amazon Forecast service, including what it does and how it works importing datasets, training predictors, and generating forecasts.

Learning Objectives

  • Learn the fundamentals of Amazon Forecast
  • Understand how to ingest data in Amazon Forecast
  • Learn how to train the predictor model
  • Learn how to create a forecast

Intended Audience

This course is intended for architects, developers, line of business managers, executives, and data scientists looking to improve their forecasting results in their business.


In order to get the most out of this course, you will need to meet the requirements for the AWS Cloud Practitioner Certification.


Amazon Forecast is easy to use following the general three-step sequence of import your data, train the predictor, and generate forecasts. Let's discuss some of these in more detail. The import your data step is to create a dataset that is required to train predictors, which, in turn, are used to generate forecasts. This entails preparing your data and uploading it to Amazon S3 bucket. Amazon Forecast provides several filling methods to automatically handle missing values in your datasets, potentially saving you a significant amount of time in the cleaning of your data before using it.

The Target Time Series, or TTS dataset, is the minimum data required to do a forecast. It's structured with an item_id, or some unique identifier, for the things that you want to forecast. It also includes a TimeStamp when something occurred, like a sail in retail, for example. Finally, it includes a Target_Value, like sales quantity, such as how many items were sold. This is the minimal structure required for datasets: a column for an item_id, another column for a TimeStamp, and a third for Target_Value.

Datasets can include additional columns, but these are the three essential and required ones. In addition to historical data, sometimes other data is known per item at exactly the same time as every data point in the time series. This is called a Related Time Series, or RTS. Related Time Series can provide more indicators to what future predictions could look like. The best related data could be prices, promotions, holidays, sometimes even weather.

For the use cases of new product introduction, or cold starts, it is important to have Item Meta Data, or IM. Item Meta Data is static information with respect to time, it changes only per fixed item_id. This could be metadata like genre, color, class, or type of item. In this first step of importing data, it's important to highlight that data preparation and cleansing are essential steps that need to be applied prior to doing the import. The fact is that dirty data can affect the accuracy of forecast models, and we need to prepare our data before importing it.

There are several reasons why there may be missing values in your data. A missing value means we can observe a row does not exist, or the row exists, but the value is not available. This is a problem when trying to train a model because the model can not differentiate this scenarios, so you need to handle them or provide the model with additional information about what is happening at that time.

Why a particular value is missing will depend on your business, and you should use your knowledge of the business to fill those values in order to improve your forecast. Amazon Forecast supports you in filling missing values. You can indicate to Amazon Forecast how you would like to fill these values by adding a Featurization Configuration, as it is called, when creating your predictor.

For the Target Time Series, you can specify filling missing values with a zero, a particular value, the mean, the median, the minimum value, the maximum value, or NAN, meaning not a number. The automatic filling of missing values can potentially reduce this data cleansing steps from hours to seconds. This is a time-saving feature, which also helps improve the accuracy of your forecast. For the sake of brevity, we will assume the data has been prepared and cleansed accordingly. This suggests your dataset has been processed for missing values, and properly handles zeros, invalid values, and outliers.

About the Author
Jorge Negrón
AWS Content Architect
Learning Paths

Experienced in architecture and delivery of cloud-based solutions, the development, and delivery of technical training, defining requirements, use cases, and validating architectures for results. Excellent leadership, communication, and presentation skills with attention to details. Hands-on administration/development experience with the ability to mentor and train current & emerging technologies, (Cloud, ML, IoT, Microservices, Big Data & Analytics).