This brief course covers the fundamentals of Amazon MSK, including what the service is, how it works, and how to provision an Amazon MSK cluster. You will also be guided through how Amazon MSK fits into a functional architecture.
If you have any feedback relating to this course, feel free to reach out to us at support@cloudacademy.com.
Learning Objectives
- Learn about the Amazon MSK service and how it works
- Learn how to provision an MSK cluster
- Understand how Amazon MSK fits into a functional architecture
Intended Audience
This lecture is perfect for anyone with no previous knowledge of Amazon MSK, who wants to learn more about the service, as well as those who are interested in taking the AWS Certified Data Analytics - Specialty (DAS-C01) Certification.
Prerequisites
To get the most out of this course, you should have a basic general understanding of cloud computing, preferably with Amazon Web Services experience. It would also be beneficial to have some basic knowledge of streaming data services such as Amazon Kinesis and Apache Kafka.
Amazon Managed Streaming for Apache Kafka or Amazon MSK allows you to run applications that utilize Apache Kafka within AWS. Kafka provides a platform for stream processing and operates as a publisher/subscriber based durable messaging system. Its key features are the ability to intake data with extreme fault tolerance, allowing for continuous streams of these records that preserve the integrity of the data, including the order in which it was received.
Apache Kafka then acts as a buffer between these data producing entities and the customers that are subscribed to it. Subscribers receive information from Kafka topics on a first in, first out basis or FIFO, allowing the subscriber to have a correct timeline of the data that was produced.
Kafka is an open-source technology that allows for a large number of community-driven tools and add-ons. This makes it very customizable and gives developers the freedom to build and create what they need. Using Amazon MSK, you can use the native Apache Kafka APIs to create datalakes, stream information to multiple sources, and power data analytic applications and pipelines, again all within AWS.
The main reason you'd wanna use Amazon MSK over rolling your own implementation of Kafka is that Amazon MSK is a fully managed service. This means you don't need to take care of any servers, you don't need to worry about any upgrades and you also don't need to bother with handling Apache Zookeeper, a required coordinating software that deals with orchestrating the cluster task and maintaining the state of the cluster in general.
William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.