Amazon MSK and Kafka Under the Hood
Start course
3h 46m

Domain One of The AWS Solution Architect Associate exam guide SAA-C03 requires us to be able to Design a multi-tier architecture solution so that is our topic for this section.
We cover the need to know aspects of how to design Multi-Tier solutions using AWS services. 

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Learn some of the essential services for creating multi-tier architect on AWS, including the Simple Queue Service (SQS) and the Simple Notification Service (SNS)
  • Understand data streaming and how Amazon Kinesis can be used to stream data
  • Learn how to design a multi-tier solution on AWS, and the important aspects to take into consideration when doing so
  • Learn how to design cost-optimized AWS architectures
  • Understand how to leverage AWS services to migrate applications and databases to the AWS Cloud

Everything you need to know about Kafka boils down to three main ideas. You have producers who create data, such as a website gathering user traffic flow information, you have topics which received the data, this information is stored with extreme fault tolerance and you have consumers which can read that data in order and know that it was never changed or modified along the way.

Kafka is often used as a decoupling mechanism to help relieve tension among many different producers and consumers. For instance, you might have 10 websites, all creating log information that needs to be processed.

Let's say that you also have 20 microservices that each try to filter out and make predictions for various specific variables of that data. If you were to hard code all this information, you would have 200 separate connections that you need to worry about.

By using Kafka as an intermediary, all of that log information can be pushed into a single topic. This one topic is now the single source of truth for all of your microservices. They can each read through and gather the information they require on demand. This topic will hold the producers information until the retention period has been met. This window is configurable and has a default time of seven days.

Kafka also has a size-based retention policy where you configure the maximum amount of data that can be stored. Once the max amount of data has been reached, Kafka will start kicking out and removing old information. Both of these options can be configured on a per topic basis, which provides a lot of flexibility in keeping data costs down or to retain high value information for longer.

Each topic has a number of partitions where the data will be randomly written unless a partition key is provided. Once data has been written to a topic, it can never be changed. You can provide an update to that data, but it would just be the next entry in the partition instead of overriding the original data. The more partitions you have for a topic, the more parallelism you can have.

About the Author
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.