One of the hardest parts of data analysis is data cleanup. In this course, you will learn how you can use Cloud Dataprep to easily explore, clean, and prepare your data.
- What Cloud Dataprep can do
- Differences between editions
- How to import a dataset
- How to create a recipe
- How to create and execute a flow
- GCP Data Scientists
- GCP Data Engineers
- Anyone preparing for a Google Cloud certification (such as the Professional Data Engineer exam)
- Access to a Google Cloud Platform account is recommended
You should be aware that Cloud Dataprep is actually an integrated partner service operated by Trifacta, a separate company from Google. Now, Trifacta is a Premier Google Cloud Partner and Google is an investor in Trifacta. So, the two companies work closely in order to provide a seamless user experience. However, this does mean you will notice some differences between Dataprep and most other GCP services. For example, the official documentation is actually stored at trifacta.com, and not at cloud.google.com. Also, when you navigate to Dataprep from the Google Cloud console, it will redirect you to clouddataprep.com.
Another significant difference is that there are different versions or “editions” of Dataprep that you can choose from:
- Starter edition
- Professional edition
- Enterprise edition
Each has its own set of features and cost. You can get the full details at: trifacta.com/pricing/
But I will go over some of the basics now, to provide a general overview.
The Starter edition gets you all the core features at the lowest price. You will have access to the interactive data transformations, and you will get personalized recommendations for which ones to use. You can share all your work and collaborate with others in real-time. And you will be able to connect to the core data systems such as BigQuery and Cloud Storage. Support is limited however. You can seek help from the community via Stack Overflow, Google Cloud Slack channel, or using the Trifacta forums. But you won’t be eligible to get direct support from Trifacta itself.
The Professional Edition includes all the features of the Starter Edition, but with a few new ones as well. You will be able to access data from an expanded list of sources including Google Analytics, Salesforce, Microsoft SQL Server, Oracle, and PostgreSQL. Dataprep Professional will also automatically identify data flaws and provide suggestions to fix them. In addition, you can schedule jobs on a recurring basis, and it even provides conditional branching for full automation. Finally, the Professional Edition includes real support. You can call or chat with a Trifacta support engineer during normal business hours. And you can also reach out for advice and guidance. Of course, all these extra features come with a higher cost.
The Enterprise Edition gives you the most options and the highest level of support, but it has the highest price. It (of course) includes all the features of the Professional edition. Plus you will get more options for access and security. It also grants you premium customer support that includes 24/7 coverage, fast response times, and a dedicated Customer Success Manager.
The different editions allow you to choose the best balance of price and features for your company. I recommend visiting the Trifacta pricing page to see the latest prices and review the detailed list of options available.
Daniel began his career as a Software Engineer, focusing mostly on web and mobile development. After twenty years of dealing with insufficient training and fragmented documentation, he decided to use his extensive experience to help the next generation of engineers.
Daniel has spent his most recent years designing and running technical classes for both Amazon and Microsoft. Today at Cloud Academy, he is working on building out an extensive Google Cloud training library.
When he isn’t working or tinkering in his home lab, Daniel enjoys BBQing, target shooting, and watching classic movies.