1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Introduction to Google Cloud Dataprep

Features

Contents

keyboard_tab
Cloud Dataprep
2
Features
PREVIEW2m 15s
3
Editions
3m 35s
4

The course is part of this learning path

Features
Overview
Difficulty
Intermediate
Duration
42m
Students
250
Ratings
5/5
starstarstarstarstar
Description

One of the hardest parts of data analysis is data cleanup.  In this course, you will learn how you can use Cloud Dataprep to easily explore, clean, and prepare your data.

Learning Objectives

  • What Cloud Dataprep can do
  • Differences between editions
  • How to import a dataset
  • How to create a recipe
  • How to create and execute a flow

Intended Audience

  • GCP Data Scientists
  • GCP Data Engineers
  • Anyone preparing for a Google Cloud certification (such as the Professional Data Engineer exam)

Prerequisites

  • Access to a Google Cloud Platform account is recommended
Transcript

Google offers many powerful tools for performing data analysis.  But before you can start using them, you need to first compile and prepare your data.  This initial “data cleaning” is often the most challenging and time consuming step.  You may have access to a huge amount of information, but it is composed of many different pieces.  And each piece might live in a different location and be stored in an incompatible format.

Google Cloud Dataprep is a tool designed to help you prepare your structured and unstructured data for use.  Whether your task involves data analysis, creating reports, or even machine learning, Dataprep makes transforming data much easier.  Here are just a few of its features:

  1. Fast exploration
    • Dataprep allows you to visually explore your data.  You can quickly determine what information you have and what format it is in.
  2. Rich transformations
    • Dataprep provides you hundreds of functions to format your data in whatever way you need.  You can do aggregation, pivots, joins, unions, merges, extractions, and much, much more.
  3. Predictive suggestions
    • Dataprep is powered by machine learning.  That means it can automatically assess the quality of your data, as well as make suggestions on ways to improve it.
  4. Pipeline orchestration
    • Once you’ve defined a sequence of transformations, Dataprep will translate the logic into a Dataflow job or BigQuery SQL statements.  So you don’t have to write any code yourself.
  5. Advanced security
    • And finally, when the transform job is executed, all processing happens exclusively within the user’s Google Cloud project.  Dataprep does not transform nor store any of the customer’s data itself, so you don’t have to worry about exposing sensitive data.

With Cloud Dataprep, it’s really quite easy to build your own data warehouse or data lake from many different sources.

About the Author
Students
13892
Courses
23
Learning Paths
10

Daniel began his career as a Software Engineer, focusing mostly on web and mobile development. After twenty years of dealing with insufficient training and fragmented documentation, he decided to use his extensive experience to help the next generation of engineers.

Daniel has spent his most recent years designing and running technical classes for both Amazon and Microsoft. Today at Cloud Academy, he is working on building out an extensive Google Cloud training library.

When he isn’t working or tinkering in his home lab, Daniel enjoys BBQing, target shooting, and watching classic movies.