CloudAcademy
  1. Home
  2. Content Library
  3. Google Cloud Platform
  4. Courses
  5. Introduction to Google Cloud Dataproc

Introduction to Google Cloud Dataproc

The course is part of this learning path

Data Engineer – Professional Certification Preparation for Google

course-steps 9 quiz-steps 5

Contents

keyboard_tab
Introduction
lock
Introduction2m 19s
lock
What Is Cloud Dataproc?4m 43s
Using Cloud Dataproc
lock
Running a Simple Job8m 50s
lock
Access Control2m 28s
lock
Scaling a Cluster14m 53s
lock
Connecting to BigQuery7m 36s
lock
Customization5m 31s
Conclusion
lock
Conclusion3m 7s
play-arrow
Start course
Overview
Transcript
DifficultyIntermediate
Duration49m 27s
Students301

Description

Course Description

Google Cloud Dataproc is a managed service for running Apache Hadoop and Spark jobs. It can be used for big data processing and machine learning.

But you could run these data processing frameworks on Compute Engine instances, so what does Dataproc do for you? Dataproc actually uses Compute Engine instances under the hood, but it takes care of the management details for you. It’s a layer on top that makes it easy to spin up and down clusters as you need them.

Learning Objectives

  • Explain the relationship between Dataproc, key components of the Hadoop ecosystem, and related GCP services
  • Create, customize, monitor, and scale Dataproc clusters
  • Run data processing jobs on Dataproc
  • Apply access control to Dataproc

Intended Audience

  • Data professionals
  • People studying for the Google Professional Data Engineer exam

Prerequisites

  • Hadoop or Spark experience (recommended)
  • Google Cloud Platform account (sign up for free trial at https://cloud.google.com/free if you don’t have an account)

This Course Includes

  • 49 minutes of high-definition video
  • Many hands-on demos

 

The github repository is at https://github.com/cloudacademy/dataproc-intro.

 

About the Author

Students5751
Courses21
Learning paths9

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).

Covered topics

StorageDatabasesAnalyticsComputeGoogle Cloud PlatformDatabases for GoogleStorage for GoogleAnalytics for GoogleCompute for GoogleCloud DataprocBigQuery