Data Visualization using Matplotlib
The course is part of this learning path
This course will guide you through all the possible techniques that are used to visualize data using the Matplotlib Python library.
In this course, we will explore the main functionalities of Matplotlib: we will look at how to customize Matplotlib objects, how to use various plotting techniques, and finally, we will focus on how to communicate results.
If you have any feedback related to this course, feel free to contact us at email@example.com.
- Learn the fundamentals of Python's Matplotlib library and its main features
- Customize objects in Matplotlib
- Create multiple plots in Matplotlib
- Customize plots in Matplotlib (annotations, labels, linestyles, colors, etc)
- Understand the different plot types available
- Data scientists
- Anyone looking to create plots and visualize data in Matplotlib
To get the most out of this course, you should already be familiar with using Python, for which you can take our Introduction to Python learning path. Knowledge of Python's Pandas library would also be beneficial and you might want to take our courses Working with Pandas and Data Wrangling with Pandas before embarking on this Matplotlib course.
The data used in this course can be found in the following GitHub repository: https://github.com/cloudacademy/data-visualization-with-python-using-matplotlib
Welcome! My name is Andrea Giussani, and I am going to be your instructor for this course on data visualization with Python using Matplotlib.
This course will guide you through all the possible techniques that are used to visualize data using the python language, and that will help you become a data viz ninja.
In particular, we will focus on the most important data visualization library in python, Matplotlib, which is the benchmark in Python when we want to plot data.
In this course, we will explore the main functionalities of Matplotlib, we will look at how to customize the Matplotlib objects, we will also dive into different plotting techniques, and finally, we will focus on how to communicate results.
By the end of this course, you will be able to produce stunning plots that will definitely increase the value of your work. The data we will use is available on the course-specific GitHub repository: so please feel free to go there and download it, and follow along with the course.
As a data scientist, you often have to communicate relevant information to colleagues, information that generally comes either from exploratory data analysis or from a sophisticated machine learning algorithm. This is indeed a very important step, since the way in which we present those insights has a direct impact on the perception of the importance of those results from a business point of view.
So the question is: how do we effectively communicate those results with our colleagues?
Take, for instance, the following dataset:
This shows the first 5 rows of the Gapminder dataset. The Gapminder dataset contains different variables for each country, such as population size, life expectancy, GDP per capita, babies per woman, and Child Mortality rate (under five years of age) for different years. It goes from 1964 through to recent years. In this example, we are taking into account the year 2012.
Suppose that we are interested in the relationship between GDP and life expectancy, and how it’s evolved over time.
We could take the raw dataset, and say, apply a regression model on the data to infer the quantitative relationship between those features.
This is fine, but in many cases, we want to understand the data at a glance.
To do so, instead of looking at the raw data table, we could perform Exploratory Data Analysis, EDA in short, and understand the patterns in our data… but in practice, we typically synthesize the aforementioned information into a picture, such as this one.
It looks much better, doesn’t it?
Now the question is: what can we infer from this picture? Well, quite a lot actually!
Firstly, note that the size of the bubble represents the population size.
But the most interesting one is that the relationship is not linear. So, if we had applied a linear regression model to this data, it would have been the wrong one, with the consequence of obtaining an unreliable estimation.
As a consequence, visualizing the raw dataset before any sophisticated analysis is very important. Indeed, visualizing data can be very effective especially in the initial phase of a data science project, where we have to choose a model for our data. This will definitely depend on the patterns identified in the data during the visualization phase.
Furthermore, showing an effective plot during a meeting could be more explicative and persuasive than explaining the quantitative analysis performed on the raw data. Indeed, many people in business do not have a quantitative background, and therefore it is important to easily translate that information in a way that is easy to understand for everybody.
All the techniques that transform raw data into a sort of visual representation fall into the discipline of Data Visualization.
With the term Data Visualization we could be referring to the process of transforming raw data into an embedded graphical dimension, which borders with descriptive statistics, graphical design, and storytelling; but more importantly, data viz techniques are used extensively in different departments of companies to communicate their work to a broader audience.
Data visualization is a complex field, with many different domains of application. Please, note that this course will mainly focus on data analysis.
Ok, we have understood the importance of data visualization. It’s now time to understand how we perform data visualization. To do so, we will use the Python language, and use the Matplotlib library to perform data visualization. Let’s dive into it; see you in the next lecture.
Introduction to Matplotlib - Customization in Matplotlib - Multiple Plots in Matplotlib - Annotating Text with Matplotlib - Advanced Customization in Matplotlib - Different Plot Types in Matplotlib - Conclusion
Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.
He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.