Architectural Example
Start course

This brief course covers the fundamentals of Amazon MSK, including what the service is, how it works, and how to provision an Amazon MSK cluster. You will also be guided through how Amazon MSK fits into a functional architecture.

If you have any feedback relating to this course, feel free to reach out to us at

Learning Objectives

  • Learn about the Amazon MSK service and how it works
  • Learn how to provision an MSK cluster
  • Understand how Amazon MSK fits into a functional architecture

Intended Audience

This lecture is perfect for anyone with no previous knowledge of Amazon MSK, who wants to learn more about the service, as well as those who are interested in taking the AWS Certified Data Analytics - Specialty (DAS-C01) Certification.


To get the most out of this course, you should have a basic general understanding of cloud computing, preferably with Amazon Web Services experience. It would also be beneficial to have some basic knowledge of streaming data services such as Amazon Kinesis and Apache Kafka.


I think one of the best ways to really understand how a service works is to see how it fits into a functional architecture. Let's take a look at a serverless example from Amazon that utilizes Amazon MSK.

Pretend that we are a business that deals with the public a lot, and we need to know what is currently trending on the minds of our customers and the world at large. It might be beneficial to create some kind of application that can pull Twitter and perform sentiment analysis on hashtags or read through tweets to explore commonalities.

Take a look at this architecture. Here we have a system that utilizes Amazon MSK, AWS Glue, Amazon Redshift, and displays all of the relevant information gathered from these services using Amazon QuickSight for creating functional graphs. With the setup, we can use Twitter as a data producer and read through all public hashtags to see what's currently trending on the platform.

Let's take a moment to explore the role that each of these services play. Amazon MSK, this service, as we discussed already, acts as a staging area for all of our data. Amazon MSK makes sure the information we receive stays in the order we obtain it and doesn't disappear or self-destruct in any way. Now it is important to state that MSK is mostly just the landing pad for data and requires a producer to push information into it.

In this example, we can use a Apache NiFi to configure and pull data from Twitter. However, you can use any mechanism of your choice. AWS Glue then acts as a data consumer and is able to pull information from Amazon MSK. Glue can perform streaming ETL, extract to transform load, to take the raw data, convert it into something meaningful, and push it in the Amazon Redshift. Redshift is simply the data warehouse solution that will work well to hold all of our data.

And finally, QuickSight can be used to query and visualize your data that was stored in Amazon Redshift. It can show your results is line charts, tables, pie charts, bar graphs, and word clouds. This is extremely valuable for making decisions and gleaning information from your data. And there we have it, a simple architecture that pulls data from Twitter and allows you to visualize and make meaningful decisions on hashtags for future use.

About the Author

William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.