This module looks at how to plot data in R. It builds on the knowledge you have learned in the previous modules to show you how to use R to interpret your data.
Learning Objectives
The objectives of this module are to provide you with an understanding of:
- How to create a basic plot in R
- Various types of plots in R
- The different graphical parameters of plots in R
- How to layer a plot in R
Intended Audience
Aimed at all who wish to learn the R programming language.
Pre-requisites
No prior knowledge of R is assumed. Delegates should already be familiar with basic programming concepts such as variables, scope, and functions. Experience of another scripting language such as Python or Perl would be an advantage. Understanding mathematical concepts will be beneficial.
Feedback
We welcome all feedback and suggestions - please contact us at qa.elearningadmin@qa.com to let us know what you think.
[Instructor] We can continue exploring graphical parameters of the plot functions by looking at the other types that are available. We can type in question mark plot to have a look at the help window and if I scroll over with my mouse, I can roll down a note that there are many different types of available. And in here I'd like to start by looking at C and we'll work our way through a few other interesting types of plots. So by starting with C. What is C? C provides a dashed line formed by removing the points that would otherwise overlay the line. This would mean that basically, the gaps in the line correspond to the data points whilst the lines are the connections between the data points. In order to understand that, lets create some temporary dummy data.
Here we have a very simple temp x, ranging from one to 10 and we have a temp y, which is the squared values of this. So if I produce a very simple plot, for example, comparing x against y, we can see on the plot tab in the bottom right pane that we have a scatter plot for this. Now, instead of having a default scatter plot, I would like to create a type C plot. Here I add in the argument type C. I'm utilising two other parameters within the plot function, one for the main title and another for the subtitle. And this is what we mean when we say that the lines are the connections between the data points and the gaps in the lines correspond to data points. Okay, we can also make a type S function. Now this would be a stairs plot. So we call this a stairs steps and what we're doing is we're interpolating step lines between the points rather than here, where we have pre-existing straight lines. So let us create our data set again or we can utilise the same data set and instead of C, I will now use S and we'll see that this plot changes. Here what's happening is we're moving first vertically, then horizontally, in a step fashion. I can create more complicated plots, say for example the sin function. And if I wanted to I could create that using a type of S but this case lower case S. I can just close that off here. And I can show you that now we've created a step function along a curve. And in this case with a lower case S what we have is a move horizontally then vertically. With a capital case S we have a move vertically then horizontally. So its a subtle difference between the two depending on what your use case nature is.
We can understand what the B stands for when we consider the type argument as being both points and line. If I was to run using our temp temp data that we have with a type B as an argument, we will end up seeing both dots and lines. I've chosen to add in another parameter called PCH, we'll come to that shortly, and color, COL for color. Another interesting parameter is the scaling argument that we might use. So let me create a new plot, which is my histogram using the type H of my sin function. And if I'd like to scale this, I can run this again but this time adding in ASP equals two. I could change this again for four. I can change this again for eight. I can change this again for 100 and so on but you get the point, that we're affecting the limits of my plot and scaling the graph to a sensible level. So let's change that to 3/2 before we move on or 1.5 for argument's sake. Now you might be wondering, what happened with all our points, how did we understand that there are various different points available? This is me talking now about the PCH argument of the plot function. So there is a points function and if I scroll down to the detail section in the help window, I can see that there are common graphical parameters that are used, which are detailed in here, ranging from zero to 18, a dot being 46 and so on and so forth. We can get a longer list if we type in help for the par function and in here again you can learn more about graphical devices and plots and interactions. But for now lets try and understand the PCH or the potting characters that are available.
Let me use colors as a function, now the screen will get filed with a lot of different colors, if I ran it correctly. But lets not waste time looking at all of these in detail, lets just show you how many there are. There are 657 but as there's hundreds, lets just take a look a ten of them. Okay, and I'd like to understand the PCH or the plotting character of my plot function. Let me create a plot on the screen, which is a very simple function, ranging from one to 25 with a separator of one and I'm putting that in as a single argument into my plot function and it creates an index ranging from one up to 26 or 25, depending on the number of items in my sequence. Okay, and I can now add in the PCH or the plotting character for this and I can ask it to plot the characters that exist in the graphical system, raging from one to 25. And as you can see, if I click on the zoom, just to help if people can't see that, you can see there are various different symbols that are available depending on what PCH we choose. I can make this slightly larger by using the CEX argument. So that is me tripling the size of each of my points. So the CEX argument is about magnification or expansion and the BG is the background colors or the fill that I'm using for open plot symbols, in this case there are a few open top symbols at the top right, so it's picking colors four to seven inside the background argument.
I can plot the cos function using a character two, then if I remind you, if we scroll up to our first function, number two is a triangle. I could of, for example, chosen because I have an interest in maybe seeing one, two, three, four, five, six, seven, which is a square with a cross, I cold use that. Now you can see that as being the point that I have chosen to use. I could've if I wanted to put anything in here outside of the predefined pretty pictures. I could use the dollar symbol, I could use L, I could use whatever I want. The plot function is very flexible. In order to understand what the plotting character options are, lets plot some more character values. So sticking with the seven as the square of the cross I can now expand that and say that actually I'd like to see the first 50. And we see a problem on the screen because there are some gaps occurring that there is no implementation of a PCH value, so this is a warning which we can ignore. Can we think about instead of seeing this, so let's run this for just one to 19 and ignore the warnings for now. And if I was to say this is a rectangle and it's less appealing than it should be, I can change this to being a square plot using the par function and setting the plot region to being a square. And if I rerun my one to 19 plot, I can see a square, which is a lot more appealing and aesthetically making sense.
I might feel like I'm not gaining enough clarity on the screen so I can maybe split this out into one to 10 and 10 to 19, for example. And here I'm running, I'm just using the plot function with the x and y arguments as being a placeholder for where I would like to see my different PCH points or character values. We can set the aspect ratio within the plot as a parameter rather than worrying about updating the overall structure of all my different plots. And here we can see that PCH arguments after and outside of the warnings and after we've moved up to 32 up to 50, we can see that there are various different characters and some numbers at the top. I can change the color using the color argument and here I'm just gonna pick for arguments sake number two and see that there is a subtle change, then maybe choose one and as you can see the screen is not updating that much but its more just to show you that there is a difference in color and If I choose red, you'll see that the text font has changed. So you can play with the color argument and choose whichever color out of the 657 colors you'd like. We can continue exploring our PCH argument, so if I just clear the screen and add in a note we're gonna see some other axis characters. If I was to plot this, it looks quite cluttered so let's break this out first and then I can click on the zoom and that shows us that from 50 up to 75 and you can now use this as a memory store for where you need to think about all of your different values. So I can show you from 75 to 100 or 76 to 100 and we can zoom in on that and we can see that the rest of the alphabet is displayed here and then we have moving on to the lower case alphabets. Now I showed you L using the exclamation marks and quotation marks and L as a parameter or you could use 76 as we can see is a predefined PCH or character value.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.