Aggregating Data with Amazon Managed Streaming for Apache Kafka (MSK)

Lab Steps

lock
Logging in to the Amazon Web Services Console
lock
Creating an Amazon MSK Cluster Configuration
lock
Connecting to the Virtual Machine using EC2 Instance Connect
lock
Creating Topics using the Apache Kafka Command-line Interface
lock
Populating and Processing Topic Data
lock
Visualizing Your Aggregated Data

Ready for the real environment experience?

DifficultyBeginner
Time Limit2h
Students373
Ratings
4.7/5
starstarstarstarstar-half

Description

Amazon Managed Streaming for Apache Kafka (also known as Amazon MSK) is an event streaming platform that's capable of handling events numbering in the trillions per day. Originally Apache Kafka was designed to be a type of message queue, it has proven itself useful in many other use-cases too.

This managed offering from AWS makes reliably setting up and managing Apache Kafka clusters simple. You don't need to worry about provisioning servers, or keeping them patched up to date. Amazon MSK integrates with existing AWS technology. Storage is secure and durable, and monitoring is taken care of with Amazon CloudWatch.

In this Hands-On lab, you will see how to create a cluster configuration for an Amazon MSK cluster. You will connect to an Amazon MSK cluster and create some Topics. And you will create a simple application using the Faust streaming library that populates the Topics and aggregates the data.

Please note, this lab creates an Amazon MSK cluster which can take over twenty minutes to finish setting up. Please make sure you have enough time available before starting this lab.

Learning Objectives

Upon completion of this beginner-level lab, you will be able to:

  • Create an Amazon MSK Cluster Configuration
  • Create a Topic in an Amazon MSK cluster using the Apache Kafka command-line tools
  • Implement a Python script that aggregates Topic data
  • Retrieve data from Topics using the command-line and Python

Intended Audience

  • Data Engineers
  • Cloud Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • Amazon Managed Streaming for Apache Kafka
  • The Linux Bash shell
  • The AWS command-line interface
  • Python

The following courses and lab can be used to fulfill the prerequisites:

Updates

August 31st, 2021 - Resolved an issue with the commands used to wait for the MSK cluster to become active

August 30th, 2021 - Emphasized the warning about the time it takes to create the MSK cluster

June 18th, 2021 - Clarified some instructions

Environment before
PREVIEW
arrow_forward
Environment after
PREVIEW
About the Author
Students32364
Labs78
Courses2
Learning paths2

Andrew is a Labs Developer with previous experience in the Internet Service Provider, Audio Streaming, and CryptoCurrency industries. He has also been a DevOps Engineer and enjoys working with CI/CD and Kubernetes.

He holds the Developer - Associate, Sysops Administrator - Associate, and Solutions Architect – Associate AWS certifications.