Fundamentals of AWS Data Wrangler
Start course

Learning Objectives

This course is an introductory level AWS development course. You will learn about the AWS Data Wrangler library, what it does, and how to set it up to be able to use it. 

Intended Audience

This course is intended for AWS Python developers familiar with the Pandas and PyArrow libraries who are building non-distributed pipelines using AWS services. The AWS Data Wrangler library provides an abstraction for connectivity, extract, and load operations on AWS services. 


To get the most out of this course, you must meet the AWS Developer Associate certification requirements or have equivalent experience.

This course expects that you are familiar with and have an existing Python development environment and have set up the AWS CLI or SDK with the required configuration and keys. Familiarity with Python syntax is also a requirement. We walk through the basic setup for some of these but do not provide detailed explanations of the process. 

For fundamentals and additional details about these skills, you can refer to the following courses here at Cloud Academy:  

1) Python for Beginners 

2) Data Wrangling With Pandas

3) Introduction to the AWS CLI 

4) How to Use the AWS Command-Line Interface



Fundamentals of AWS Data Wrangler. The AWS Data Wrangler library runs on Python 3.6, 3.7, 3.8, and 3.9 as well as platforms including AWS Lambda. It provides an API reference to interact with a number of AWS services. What can you do with each service is a bit different depending on the service itself. The library continues to grow in terms of use, functionality, and support. For example, you can use AWS Data Wrangler to directly write records stored in a data frame to an RDS instance for PostgreSQL, MySQL, and Microsoft SQL server. 

Some of the operations provided by Data Wrangler for RDS are to provide access to the RDS Data API, to create an RDS Data API connection, and to run a SQL query on an RDS Data API connection and then return the results as a DataFrame. You can also connect to Amazon S3 and read/write Excel files, JSON files, CSV files, and Parquet data format files, to and from Pandas DataFrames. Some of the operations provided by Data Wrangler for S3 includes copying a list of S3 objects to another directory, delete S3 objects, describe objects, check if an object exist, download a file, get bucket region names, list Amazon S3 buckets, more importantly, filter the contents of an Amazon S3 object based on a SQL statement. You can find the size of objects and upload a file from your local system to an S3 path. The idea is to provide a library of abstracted functions to AWS data services, as a way to build data pipelines where data is fetched from possible multiple sources, it gets transformed and then stored for business use like analytics.


About the Author
Jorge Negrón
AWS Content Architect
Learning Paths

Experienced in architecture and delivery of cloud-based solutions, the development, and delivery of technical training, defining requirements, use cases, and validating architectures for results. Excellent leadership, communication, and presentation skills with attention to details. Hands-on administration/development experience with the ability to mentor and train current & emerging technologies, (Cloud, ML, IoT, Microservices, Big Data & Analytics).