Working With Data Frames


Pandas is Python’s ETL package for structured data and is built on top of NumPy, designed to mimic the functionality of R data frames. It provides a convenient way to handle tabular data and can perform all SQL functionalities, including group-by and join. Furthermore, it's compatible with many other Data Science packages, including visualization packages such as Matplotlib and Seaborn.

In this lesson, we are going to explore Pandas and show you how it can be used as a powerful data manipulation tool. We'll start off by looking at arrays, queries, and dataframes, and then we'll look specifically at the groupby function before rounding off the lesson by looking at how the merge and join methods can be used in Pandas.

If you have any feedback relating to this lesson, feel free to tell us about it at

Learning Objectives

  • Understand the fundamentals of the Pandas library in Python and how it is used to handle data
  • Learn how to work with arrays, queries, and dataframes
  • Learn how to use the groupby, merge, and join methods in Pandas

Intended Audience

This Lesson is intended for data engineers, data scientists, or anyone who wants to use the Pandas library in Python for handling data.


To get the most out of this lesson, you should already have a good working knowledge of Python and data visualization techniques.


The dataset(s) used in the lesson can be found in the following GitHub repo: 

About the Author
Thomas Holmes, opens in a new tab
Data Science Trainer at QA Ltd.

Delivering training and developing courseware for multiple aspects across Data Science curriculum, constantly updating and adapting to new trends and methods.

Covered Topics