This course will guide you through all the possible techniques that are used to visualize data using the Matplotlib Python library.
In this course, we will explore the main functionalities of Matplotlib: we will look at how to customize Matplotlib objects, how to use various plotting techniques, and finally, we will focus on how to communicate results.
If you have any feedback related to this course, feel free to contact us at support@cloudacademy.com.
Learning Objectives
- Learn the fundamentals of Python's Matplotlib library and its main features
- Customize objects in Matplotlib
- Create multiple plots in Matplotlib
- Customize plots in Matplotlib (annotations, labels, linestyles, colors, etc)
- Understand the different plot types available
Intended Audience
- Data scientists
- Anyone looking to create plots and visualize data in Matplotlib
Prerequisites
To get the most out of this course, you should already be familiar with using Python, for which you can take our Introduction to Python learning path. Knowledge of Python's Pandas library would also be beneficial and you might want to take our courses Working with Pandas and Data Wrangling with Pandas before embarking on this Matplotlib course.
Resources
The data used in this course can be found in the following GitHub repository: https://github.com/cloudacademy/data-visualization-with-python-using-matplotlib
Welcome back. In this lecture, we're going to customize our Matplotlib object. Let's recall the output of the previous lecture by pasting the necessary code snippet that is needed to produce that output. So first we code the necessary inputs and then filter out the original data frame to retain only USA observations. We then plot the year on the x-axis and the GDP per capita on the y-axis.
One of the first things we can do is add a label to the axis, to do that, we use the set X label and set Y label methods. These are methods that you can use to change certain properties of the object before calling show to display it. For instance, we can call the method, set X label on the axis object, and we pass the string year and then we call set to Y label on the axis object, and we pass the string GDP per capita.
Know that we can control the size of the text with the parameter font dict inside the function set X label. So this takes a dict and we pass for instance, the size argument to control the size of the text. So let's put 12 here. Now we repeat the same procedure for the y label so that they are equal, and we see that the size of the texts has changed. We can also add the title using the set title method, now this will place by default a title on the center at the top of the figure.
So again, we call on the axis object the method set title, and we pass the string that contains the title we wish to have on top of it. So we can put GDP per capita in the USA, and this is the title that will show up on top of the plot. Okay, so that looks much better now, however customization is not just about adding text to the axis, there's a lot more to it than that.
So we'll now go into the cosmetics of the plot, this is the process of highlighting data inside the figure. For instance, we might want to change the color of the line we just plotted or highlight the data point with a marker. First, if we look at the data, it looks like the grid we are working on seems to be continuous, but the granularity of our data is discreet, since we have measured GDP per capita over a certain number of years.
A way to tell people that we have measured such information yearly, would be to add markers to the plot. This facilitates the process of understanding the data underlying the plot, and that's because it shows where the data exists and which parts are just lines that connect between the data points. The plot method takes an optional keyword argument called marker that gives you the possibility to add markers to the plot.
So for example, we can pass the parameter marker to the plot method, and we set this equal to lowercase O like so, and this stands for circle. And now if we run the snippet, we obtain a line that is highlighted by small circles or contain small circles, and each of them represents the intersection between the year and the corresponding GDP per capita. So we can see here the year and where it corresponds to the GDP per capita.
Now if we want, we can use another marker, say a diamond, use a D for that, and you can see here now there are diamonds instead of circles. Now there are many other markers and you can see on the online documentation all the different markers you can use. And we will include a link to this documentation in the transcript of this lecture (https://matplotlib.org/stable/api/markers_api.html).
Now the measured data are highlighted with a diamond marker, and it becomes evident that the lines are just connectors between them, but we can go even further and change the appearance of the line, this is done by assigning that line style keyword argument inside the plot method. By default, this argument is set to be a solid line, denoted by the string dash as follows. So we just put like dash in there.
Like markers, there are a few lines styles you can choose from and you can see them here, and we will include the link to this site as well in the lecture transcript (https://matplotlib.org/stable/api/lines_api.html?highlight=line%20style).
So if we use, for example, double dash, this is used to indicate that the line should be dashed and you can see the result here. So in fact, here it's better to remove the markers, and there you can see that we've got a dash line in there now.
We can also choose the color that we want to be passed into the plot with the argument color. So here we'll use G and this stands for green, but of course you can pick any color you like. You might wonder if it's possible to modify the X or Y grid ticks with Matplotlib.
So first let me introduce an important functionality of Matplotlib, that is the grid. The grid is applied to the figure object, and if set to true, we're basically going to see a grid, drawn in the background of the plot, like this. So and the answer to the previous question is, yes, it's very easy with the set X ticks or set Y ticks method to add ticks.
So the methods are applied to the axis object and takes two arguments, ticks and a minor. So let us just focus on the set X ticks method for demonstration purposes. So the ticks is basically a list or an umpire array while minor is set to the default value of false, which means we pick the major ticks, if true, we pick the minor ticks.
In our demonstration, we will use the argument minor and set it equal to false. So I'm going to basically modify the X ticks as follows, I'm going to apply the set X ticks function on the axis object, and we pass on it a list of values, and this list of values will range from the minimum value observed on the year column, all the way to the maximum value, and we'll plot one observation every six years let's say.
So we have a USA and position year, and we pick it to a minimum, it's the lower bound, and the maximum is the upper bound. But since the range function excludes the upper limit value, we add one plus one. And finally we specify the increments parameter to be equal to six. So now if we run this snippet, the grid has now changed and it's much more granular since we are retaining more values on the x-axis.
Obviously, we can customize the ticks appearing in the plot with the function set X ticks labels. We can call the set X ticks labels after having called the set X ticks method. And in particular, this takes two arguments, the list of labels we want to have on the x-axis and then via the font dict dictionary, we pass a series of key value arguments, say the size of the text we want to have.
So here we're just gonna put 12 and then we can put the color of the ticks as well, so in this case, let's put, we can put green, G for green. In this way, the ticks are shown in green in a bigger size compared to the default size. So you can see here, the difference between the ones that are on the bottom, on the x-axis and the ones on the side, on the y-axis, now they're bigger and they're green.
Finally, we can add a legend to describe the variable that has been plotted, this is useful, especially when multiple series have shown or when we plot subgroups for the same variable for the same plot. To add the legend with Matplotlib is very easy, we just need to follow two steps.
First, we fill the plot method with the optional argument label like that, and this takes a string, and the string is the label we wish to be plotted in the legend. In our case, we add label equal to USA, so this basically identifies the plot shown in line eight with the label USA. Secondly, we add the legend method to the axis object without specifying any argument. So by default, the legend is placed at the location with minimum overlap with the drawn objects, and that means that Matplotlib looks for the spot that best fits the legend.
Now, obviously, we can manually set the optional argument location, and by default, this is equal to the string at best, but there are many other options, and so you can check the online documentation for that as well. So here we have a table from the documentation (https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html) that shows all possible locations for the legend, and as before that will be included in the lecture transcripts.
Please note that calling legend with no arguments automatically fetches the legend handles and their associated labels. This functionality is equivalent to calling the get legend handles labels method, and a handle in Matplotlib is another way to call an artist that is any object in the figure. So what we can do is instead of calling the legend directly, we can define two objects called handles and labels, by calling, the get legend handles labels function, with no arguments. And we then pass the handles and labels we just created to the legend, and you can see that the result is exactly the same.
A legend can also have a title, and this is specified with the argument title, which is set to none by default. So now we can add the title equal to the string and we'll use country, and you see now that the legend has a title. So just to clarify, please note that the handle is basically the line style we used in the plot and the label is the identifier passed to the plot function.
Finally, note that we can also control the font size of the legend with the following options. So we can use this command title font size, and we set this equal to medium. So there are many possibilities such as small, medium, large, and others. So again, the online documentation contains different sizes you can use as well. So in this lecture, we've covered different techniques to permit customization of a plot. So we've taken into account a single series, but multiple objects can also be plotted in one figure, and so that's what we're going to look at in the next lecture. So I'll see you there.
Lectures
Course Introduction - Introduction to Matplotlib - Multiple Plots in Matplotlib - Annotating Text with Matplotlib - Advanced Customization in Matplotlib - Different Plot Types in Matplotlib - Conclusion
Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.
He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.