ETL with AWS Glue Studio
Start course
5h 1m

This section provides detail on the AWS management services relevant to the Solution Architect Associate exam. These services are used to help you audit, monitor and evaluate your AWS infrastructure and resources.  These management services form a core component of running resilient and performant architectures. 

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage your accounts with AWS Organizations, including single sign-on with AWS SSO
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Understand how to design cost-optimized architectures in AWS
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight

Hello and welcome to this lecture where I’ll be discussing AWS Glue Studio, which is one of the tools available in the AWS Glue ecosystem. AWS Glue Studio is where you create, submit and monitor your ETL jobs. 

With AWS Glue Studio, every ETL job consists of at least three things: 

  1. A data source. This could be the Data Catalog, or a service like Amazon S3, Amazon Kinesis, Redshift, RDS, DynamoDB or another JDBC source. 

  2. Then, you need a transformation script. Glue will use the data from your source and process it according to the transformation script you write. You can write these in either Python or Scala. 

  3. Last, you need a target. Glue will export the output to a target of your choice, such as the Data Catalog, Amazon S3, Redshift or a JDBC source. 

Let’s look at Glue Studio in the Console. Here I am in the Job dashboard of the service. If I want to create a job, you can see there are many options to do so. However, they are categorized in one of two ways: I can either create a job programmatically or I can use a visual interface. 

For example, if I click the visual with a blank canvas option and click create. I can then create graphical relationships between a source, transformation scripts, and a target destination.

Let’s build one quickly. I can use the Data Catalog as my source. For my transformation script, I’ll use a built-in script called Rename Field, that renames a key in my data set to another name. Then, I can output the transformation to an Amazon S3 bucket. I can additionally choose to update my Data Catalog or not. 

While this is a pretty simple ETL job, you can create more complex relationships and graphs between services without coding at all, and Glue will generate the Apache Spark code for you behind the scenes. Note, that if you want a true no-code tool for creating ETL jobs, this won’t really provide you with that, as the built-in transformation scripts in Glue Studio are very limited. You only have about 10 options or so here. If you feel comfortable with coding, you can create custom transformation scripts in this interface using Python or Scala as well.

However, there are better places where you can develop your own custom scripts. For example, if I click back, you can see the other options for programmatically creating scripts, such as the Spark script editor, the Python shell script editor, or the built-in Jupyter Notebook interface to create Python or Scala job scripts.

That’s it for this one - see you next time. 


About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.