1. Home
  2. Training Library
  3. Serverless, Component Decoupling, and Solution Architectures (SAP-C02)

Amazon MSK and Kafka Under the Hood


Course Introduction
Utilizing Managed Services and Serverless Architectures to Minimize Cost
Decoupled Architecture
Amazon API Gateway
Advanced API Gateway
PREVIEW11m 29s
Amazon Elastic Map Reduce
Introduction to EMR
Amazon EventBridge
Design considerations

The course is part of this learning path

Start course
4h 43m

This section of the AWS Certified Solutions Architect - Professional learning path introduces common AWS solution architectures relevant to the AWS Certified Solutions Architect - Professional exam and the services that support them. These services form a core component of running resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Learn how to utilize managed services and serverless architectures to minimize cost
  • Understand how to use AWS services to process streaming data
  • Discover AWS services that support mobile app development
  • Understand when to utilize serverless services within your AWS solutions
  • Learn which AWS services to use when building a decoupled architecture

Everything you need to know about Kafka boils down to three main ideas. You have producers who create data, such as a website gathering user traffic flow information, you have topics which received the data, this information is stored with extreme fault tolerance and you have consumers which can read that data in order and know that it was never changed or modified along the way.

Kafka is often used as a decoupling mechanism to help relieve tension among many different producers and consumers. For instance, you might have 10 websites, all creating log information that needs to be processed.

Let's say that you also have 20 microservices that each try to filter out and make predictions for various specific variables of that data. If you were to hard code all this information, you would have 200 separate connections that you need to worry about.

By using Kafka as an intermediary, all of that log information can be pushed into a single topic. This one topic is now the single source of truth for all of your microservices. They can each read through and gather the information they require on demand. This topic will hold the producers information until the retention period has been met. This window is configurable and has a default time of seven days.

Kafka also has a size-based retention policy where you configure the maximum amount of data that can be stored. Once the max amount of data has been reached, Kafka will start kicking out and removing old information. Both of these options can be configured on a per topic basis, which provides a lot of flexibility in keeping data costs down or to retain high value information for longer.

Each topic has a number of partitions where the data will be randomly written unless a partition key is provided. Once data has been written to a topic, it can never be changed. You can provide an update to that data, but it would just be the next entry in the partition instead of overriding the original data. The more partitions you have for a topic, the more parallelism you can have.

About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.