The course is part of these learning paths
In this Course, we cover Python Visualization Libraries and Tools, focusing particularly on Marplot and the Seaborn plotting library. You will learn how to use these to visualize your data using Python in a clear and effective way. We will go into depth particularly on Seaborn and you'll learn about the different plot available including regression plots, pairplots, and heat maps.
If you have any feedback relating to this Course, feel free to let us know at support@cloudacademy.com.
Learning Objectives
- Use Marplot to create plots to epresent data, and format the plots
- Add information to plots such as labels, titles, legends, etc.
- Get acquainted with the Seaborn plotting library
- Learn how to plot data using Seaborn in a variety of different plots
Intended Audience
This Course is intended for data scientists, data engineers, or anybody interested in learning how to use Python tools to visualize data.
Prerequisites
To get the most out of this course, you should be familiar with the basics of programming: variables, scope, functions.
Resources
The dataset(s) used in this course can be found in the following GitHub repository: https://github.com/cloudacademy/practical-data-science-python
Now let's look at an exercise. So using the tips data set. We're going to generate a distribution of tips split by wherever the diner was a smoker or not. So we'll see if smokers don't tip as much. So we're going to be using SNS violin plot SNS box plot. Comparing various aspects of the tips data set.
So for the tips data, we should have tips and then we want something like an SNS box plot. So, we'll have a look at how save fig works as well. So, if you want a box plot you put data is equal to tips. I've asked for comparing the distribution of tips across different services. X equals tips, and Y is going to be equal to time.
Okay, we've got this next figure here. Visualizing the distribution of tips across the different services. So I suppose we could even break that down going with hue equals smoker. So then the split breaks it down even further. For example into whether someone is or is not a smoker. In terms of where the tips are.
Okay, so if we have a look at the violin plot. So we have a visualization looking like this. Then let's look at the question split wherever the diner wherever there was smoke or not. So we had split equals true, good, and there's probably not much of a discernible difference between them. Smokers and non smokers and how much they tip. That's to be expected.
Now, I'm not really sure if this is real data or not, but you can see that we have these overlays. So we can also have a look at the color maps. I think Sea map palette. There are lots of color palettes in Seaborn. So we've got a Summer palette like this. We also have a Winter palette as well. Like that. Spring as well. So every Spring, and then of course as you can imagine there is also Autumns. So here we have Autumn. There are many, many different color palettes. There's also greys and there you go. It's in black and white. But we can define our own palettes. There's a wealth of color palettes on the Seaborn documentation. So Seaborn color palette. Choosing color palettes. We have a whole documentation page about the various different color palettes that we have. So we have all sorts of things.
So these are very good for visualization and distributions of our data. It gives us an idea of outliers. Where the bulk of the data is and so on and so forth. We can classify the parameters, so we can actually get a further breakdown on some of the aspects. We want here inner, Inner is a box plots. So we have a box plot in the center of the violin plot. So we have things like count plots. If you have data that is fairly categorical, and you want to have an idea of how many elements you have.
We can call SNS.count plot. And this gives us a bar chart essentially. Passing in our data equals df. And say we want to know how many of each gender we have. X is our gender here for example. And we can see that we have in this sample, we have more females than we have males. Again, all of these have the ability to be split by certain things. So we can have hue as final judgment.
Now we'll bring back catplot, because it's useful in generating a graphic like this for us. I'm going to have quite an extreme graphic coming out here. So if we wanted to plot X as gender, and Y as hue as age. So we can look at age as well. So if I just specify hue being given by age. I have hue by age. So I can use this as a sort of generalized plot and function by passing the kind has been given by count. This is going to give me a count across our various stages.
Now this is obviously not the easiest graph to infer information from. But it shows that this is us directly creating a count plot. Can also overlay a count plot onto what's called a facet grid. Where we can visualize the respective distributions. So that's visualizing our categorical data.
We have, instead of having to faff around with our legends generation, and going through four loops and things like that, when we want to process a scatterplot with a nice label. Seaborn has scatterplot. I'm going to take my figure codes, and then we're going to generate a scatterplot where we're going to color by gender. We're going to change the marker size by how old someone is. And then we're going to change the marker shape by how much they believe in the final judgment. And again, we can do this all on a single line. We're going to have X is going to be given by weight. Y is going to be given by height. This is pretty much identical to the scatterplot we generated before. It's visualizing the same kind of data. But instead of having to label each thing as I plot them, I can just say, okay, color is going to be given by gender. Now the marker size can be given by age.
So there's a lot going on here. We're just seeing that we can change our various parameters. So automatically added to the legend has been the respective marker size for different people's ages. So now it's looking good. Everything else that we add, every other categorical piece of data we want to visualize will automatically be added.
So I can say, okay, style of marker can be given by a belief in the final judgment. Now, I'm not saying this is a good graphic. But I think it's a good graphic for demonstrating the versatility of a Seaborn scatterplot. So we can see this sort of wonderful chaos. The legend generation is possibly the most impressive part of it.
So if we wanted to save this figure, we could call fig dot save fig and specify a file name. So what should we call this? We can call it gender, gender_age_judge. And I can specify the extension I want to save it with. So we'll use jpeg for now. We will see we've got a gender_age_judge.jpeg, which is just a version of the same thing. So it's just an image saved.
So now let's have a look at an example from the Seaborn documentation. Now, what this shows is the clarity, ranking and depth of diamonds. Comparing how many carats there are to the sale price. So it's more a piece of art really than a visualization. We can infer, I suppose a sort of general trend. But this shows you again, just more parameters that we can choose. This is a way of generating figures and axes, Despine, that's something we've done already. All of our figures have been despined. This is specified at clarity ranking.
So this just tells Seaborn what order the clarity should go in. We're seeing X carat Y is price. We see that hue is clarity, size is equal to depth. All things that we've done already. The only thing that they've specified, is that they have a very exact color palette in mind. I don't know what this code means, but it's just some sort of blue color scheme that they have decided to use. They're saying that they want the color in order of the clarity. They're specifying that they want the size range to go between. And then various things like line length and things like that.
Lectures
Delivering training and developing courseware for multiple aspects across Data Science curriculum, constantly updating and adapting to new trends and methods.