Basic AWS CLI, Python, and Data Wrangler Setup
Start course

Learning Objectives

This course is an introductory level AWS development course. You will learn about the AWS Data Wrangler library, what it does, and how to set it up to be able to use it. 

Intended Audience

This course is intended for AWS Python developers familiar with the Pandas and PyArrow libraries who are building non-distributed pipelines using AWS services. The AWS Data Wrangler library provides an abstraction for connectivity, extract, and load operations on AWS services. 


To get the most out of this course, you must meet the AWS Developer Associate certification requirements or have equivalent experience.

This course expects that you are familiar with and have an existing Python development environment and have set up the AWS CLI or SDK with the required configuration and keys. Familiarity with Python syntax is also a requirement. We walk through the basic setup for some of these but do not provide detailed explanations of the process. 

For fundamentals and additional details about these skills, you can refer to the following courses here at Cloud Academy:  

1) Python for Beginners 

2) Data Wrangling With Pandas

3) Introduction to the AWS CLI 

4) How to Use the AWS Command-Line Interface



Basic AWS CLI, Python, and Data Wrangler Setup. We start our setup for a machine that is new and has not been configured. First things first, create an AWS user with define access via policies and keys. This will entail to create an identity and access management user with programmatic access, in order to operate the AWS CLI from the local system. You will need access to the AWS console to do this. Make sure to download the credentials for the user that you just created. In our case, we name the user cloudacademyuser. And it has administrative privileges for the sake of simplicity in the process of policy definitions and access controls. 

Now, the second major step is to install and configure the AWS CLI. The AWS Command Line Interface is a tool to manage AWS services from the command line and operate them using scripts. The AWS CLI version two is the recommended package to install and can be obtained for your operating system from the URL You can download the corresponding installer and run it accordingly. Next step is to configure the CLI with the credentials for the user that we created earlier. Next step is installing Anaconda from the URL Download and install the package corresponding to your operating system. After installation, start the Anaconda Navigator and create a new Jupyter Notebook. 

Last but not least, you can install AWS Data Wrangler by running the commands pip install awswrangler. Following that, it's always a good idea to restart the kernel after installation. And, you can verify the Data Wrangler version as shown on the screen. You will get the version of the Data Wrangler module installed as a result. And this will verify that AWS Data Wrangler installation and configuration happened as we expected. Please note that the Data Wrangler is installed different depending on the AWS service being used. For example, to install in your local machine, you need to use a Python-related installer like pip or conda as we just demonstrated. 

To use Data Wrangler with AWS lambda, the Data Wrangler library is made available as a Lambda layer in some regions. You can access the layer directly from the console or using its Amazon Resource Name, or ARN. If Data Wrangler is not available as a Lambda layer in your region, you can create a custom Lambda layer by downloading the zip file from GitHub and create a custom layer manually from the Lambda console. For some installation, the recommended best practice is to use a new and individual virtual environment for each project, or venv. Also on notebooks, it's a good idea to always restart your Kernel after installation.


About the Author
Jorge Negrón
AWS Content Architect
Learning Paths

Experienced in architecture and delivery of cloud-based solutions, the development, and delivery of technical training, defining requirements, use cases, and validating architectures for results. Excellent leadership, communication, and presentation skills with attention to details. Hands-on administration/development experience with the ability to mentor and train current & emerging technologies, (Cloud, ML, IoT, Microservices, Big Data & Analytics).