AWS Glue DataBrew vs. Glue Studio


Course Introduction
Amazon CloudWatch
AWS Config
What is AWS Config?
AWS Control Tower
AWS Control Tower
PREVIEW19m 56s
AWS Resource Access Manager
AWS Management
AWS Service Catalog
PREVIEW10m 34s
AWS Trusted Advisor Best Practices
AWS Health Dashboard
AWS Data Visualization
Finding Compliance Data with AWS Artifact
Observability in AWS
Start course
6h 2m

This section of the AWS Certified Solutions Architect - Professional learning path introduces the AWS management and governance services relevant to the AWS Certified Solutions Architect - Professional exam. These services are used to help you audit, monitor, and evaluate your AWS infrastructure and resources and form a core component of resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage multi-account environments with AWS Organizations and Control Tower
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
  • Learn how AWS CloudFormation can be used to represent your infrastructure as code (IaC)
  • Understand SLAs in AWS

A few years ago, Glue released another transformation tool called Glue DataBrew. On the surface, DataBrew looks very similar to Glue Studio. So, what is Glue DataBrew?

Glue DataBrew is a true no-code service for transforming data. Here’s how it works: 

You first upload your data. You can upload it directly to the service, or connect to other data sources like Amazon S3, Amazon Aurora, Amazon Redshift, Glue Data Catalog, or other JDBC Connections. It can additionally connect to AppFlow, Data Exchange, and Snowflake. 

Once you upload your data, you can preview your data in a visual interface. From there you can choose from hundreds of built-in transformations. Some of these transformations include formatting your data, modifying columns, working with duplicate or missing values, encoding data, and more. 

Once you apply your transformation, you can store the output in Amazon S3. Note that Amazon S3 is the only place you can store your transformed data. So if both of these services provide transformations, function in similar ways, and if Glue Data Studio also provides some no-code options, which service do you use? 

Well, there are four main differences between the two that might help you distinguish when to use each service: 

Glue DataBrew is a no-code tool. Unlike Glue Studio, you can’t write your own custom code for transformations even if you wanted to. However, that means that DataBrew provides a lot more options for built-in transformations. DataBrew has over 250+ built-in transformations, while Glue Studio has around 10. These transformations are different as well. Glue Studio built-in transformations focus mostly on ETL, while DataBrew's transformations mostly prepare data for machine learning.

These services are meant for different audiences. Glue Studio is meant for ETL engineers and is focused on ETL itself, while Glue DataBrew is mostly for business analysts and data scientists that may not have coding experience. You don’t need specialized expertise to transform data with DataBrew. 

Both services provide a graphical interface for visualizing your transformations. Glue Studio, however, is the only option that provides programmatic opportunities for working with ETL through Jupyter notebooks and shell scripts. 

DataBrew has a profiling feature, which enables you to get statistics about your data. For example, with profiling, you can get information about how many rows you have in your data set or how many unique values you have in each column. Glue Studio does not have a data profiling feature. That’s it for this one - see you next time! 

About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.