Kinesis Data Firehose


SAA-C03 Introduction
Decoupled Architecture
AWS Step Functions
AWS Step Functions
AWS Machine Learning Services
Design considerations
SAA-C03 Review
Start course
3h 46m

Domain One of The AWS Solution Architect Associate exam guide SAA-C03 requires us to be able to Design a multi-tier architecture solution so that is our topic for this section.
We cover the need to know aspects of how to design Multi-Tier solutions using AWS services. 

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Learn some of the essential services for creating multi-tier architect on AWS, including the Simple Queue Service (SQS) and the Simple Notification Service (SNS)
  • Understand data streaming and how Amazon Kinesis can be used to stream data
  • Learn how to design a multi-tier solution on AWS, and the important aspects to take into consideration when doing so
  • Learn how to design cost-optimized AWS architectures
  • Understand how to leverage AWS services to migrate applications and databases to the AWS Cloud

The second type of consumer to a Kinesis data stream can be an Amazon Kinesis Data Firehose delivery stream. As the name suggests, a firehose delivery stream can pick up large datasets, transform, and load them to destinations like Amazon S3, DynamoDB, Amazon Elastic Map Reduce, OpenSearch, Splunk, Data Dog, new relic, Dynatrace, sumologic, LogicMonitor, MongoDB, HTTP endpoints, and Amazon Redshift as destinations. Kinesis Firehose manages all the infrastructure, storage, networking, and configuration required to ingest and store your data to a destination. It's fully managed, which means you do not have to provision, deploy, maintain hardware, software, or write any application to manage the process. It scales automatically, and like many other AWS storage services, it replicates data across three facilities in a region. Kinesis Firehose buffers the input stream to a predefined size and to a predefined time before loading it to destinations.

The buffer size is in megabytes and go from 1 megabyte to 128 megabyte for S3, from 1 megabyte to 100 megabytes for OpenSearch, and from 0.2 megabytes up to 3 megabytes for lambda functions. The buffer interval is in seconds and goes from 60 seconds to 900 seconds. Kinesis Firehose will store data for up to 24 hours if the delivery destination is unavailable unless the source is a Kinesis data stream, in which case it will be retained according to the data stream configuration, not the data firehose configuration. In the case of putting data to Amazon Redshift, Kinesis Firehose uses Amazon S3 as the first step before loading data to your Redshift cluster. Kinesis Data Firehose does not use shards and is fully automated in terms of scalability. Kinesis Firehose can compress and encrypt data before delivering it to storage destinations. For Amazon S3, OpenSearch, and Splunk destinations, if data is transformed, you can optionally back up the source data to another and different S3 bucket. Firehose operates fast, but not in real-time. You should expect latency of 60 seconds or more when using Kinesis Firehose to store to destinations.

Also, for Kinesis Firehose, you pay for the amount of data going through it. Kinesis Data Firehose is usually the delivery service used to get Kinesis Data Stream records to AWS storage services. Message producers to Kinesis Data Firehose are not limited to Kinesis Data Streams, and any application can produce messages for Kinesis Firehose to deliver to AWS Storage services. The Kinesis Agent is a pre-fabricated Java application, which once installed and configured, collects and sends data to your delivery stream. You can install the Kinesis Agent on Linux systems for web servers, log servers, and database servers. The agent is also available on GitHub. The Amazon Linux, Red Hat Linux, and Microsoft Windows operating systems are supported. Both Kinesis Data Streams and Kinesis Firehose are part of the Kinesis streaming data platform, which includes Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams.


About the Author
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.