CloudAcademy

Building a Data Pipeline in DC/OS

The hands-on lab is part of this learning path

Introduction to DC/OS

course-steps 2 lab-steps 10

Lab Steps

keyboard_tab
lock
Logging in to the Amazon Web Services Console
lock
Understanding the DC/OS Cluster Architecture
lock
Connecting to the Virtual Machine using SSH
lock
Installing the DC/OS CLI on Linux
lock
Installing the Required Packages in the DC/OS Cluster
lock
Running the Tweeter Application
lock
Analyzing Tweets in Real-Time with Zeppelin

Ready for the real environment experience?

DifficultyIntermediate
Duration1h
Students71

Description

Lab Overview

It is relatively simple to create powerful data pipelines in DC/OS. In this Lab, you will learn how to perform streaming data analytics by building a data pipeline in DC/OS that combines multiple services and a Twitter-like application. You will review many of the fundamental concepts in using DC/OS along the way, including installing packages, using Marathon-LB to load balance traffic, and working with virtual IPs.

Lab Objectives

Upon completion of this Lab you will be able to:

  • Install DC/OS packages with custom options using the DC/OS CLI
  • Deploy a data pipeline using Kafka, Cassandra, and a social networking app
  • Use the Zeppelin package and DC/OS Spark to perform basic streaming analytics on the data pipeline

Lab Prerequisites

You should be familiar with:

  • Basic and intermediate DC/OS concepts including Virtual IPs and Marathon-LB
  • Working at the command-line in Linux
  • AWS services to optionally understand the architecture of the pre-created DC/OS cluster

Lab Environment

Before completing the Lab instructions, the environment will look as follows:

After completing the Lab instructions, the environment should look similar to:

About the Author

Students6161
Labs57
Courses3
Learning paths2

Logan has been involved in software development and research for over eleven years, including six years in the cloud. He is an AWS Certified DevOps Engineer - Professional, MCSE: Cloud Platform and Infrastructure, and Certified Kubernetes Administrator (CKA). He earned his Ph.D. studying design automation and enjoys all things tech.