The course is part of this learning path
One of the hardest parts of data analysis is data cleanup. In this course, you will learn how you can use Cloud Dataprep to easily explore, clean, and prepare your data.
Learning Objectives
- What Cloud Dataprep can do
- Differences between editions
- How to import a dataset
- How to create a recipe
- How to create and execute a flow
Intended Audience
- GCP Data Scientists
- GCP Data Engineers
- Anyone preparing for a Google Cloud certification (such as the Professional Data Engineer exam)
Prerequisites
- Access to a Google Cloud Platform account is recommended
Before I wrap up this course, let’s quickly review everything that was covered.
In the first lesson, you learned that Cloud Dataprep is a service for exploring, cleaning, and preparing structured and unstructured data. You learned about some of the main features including fast exploration, rich transformations, predictive suggestions, and advanced security.
In the second lesson, you learned about the three editions that are available: Starter, Professional and Enterprise. And I discussed some of their features and price differences.
In the third lesson, you learned about the various components used in Dataprep:
- Flows are mappings that specify which datasets to use and which recipes to apply
- Recipes are lists of transformations that are applied to a datasets
- Datasets represent your various states of your data
- Imported datasets are represent your original raw data
- Wrangled datasets represent transformed data
- Outputs represent transformed dataset that will be saved to disk
- Reference datasets represent shared data between flows
In the fourth lesson, you saw how to actually use the Dataprep tool. I demonstrated how to import data and build a recipe. And then I showed you how to stitch multiple recipes together. Finally, you saw how to save and share the results.
At this point you should have enough information to begin using Cloud Dataprep. I highly recommend spending some time playing around with the different options and experimenting on your own. There are lots of interesting datasets out there to get you started. If you don’t already have a specific project in mind, then you might consider picking a dataset related to one of your interests. So if you like sports, then you could look for a dataset that contains information about your favorite team. That way you stay motivated and interested.
Well, that’s all I have for you today. Remember to give this course a rating, and if you have any questions or comments, please let us know. Thanks for watching, and make sure to check out our many other courses on Cloud Academy!
Daniel began his career as a Software Engineer, focusing mostly on web and mobile development. After twenty years of dealing with insufficient training and fragmented documentation, he decided to use his extensive experience to help the next generation of engineers.
Daniel has spent his most recent years designing and running technical classes for both Amazon and Microsoft. Today at Cloud Academy, he is working on building out an extensive Google Cloud training library.
When he isn’t working or tinkering in his home lab, Daniel enjoys BBQing, target shooting, and watching classic movies.