Fundamentals of R
This module looks at how to plot data in R. It builds on the knowledge you have learned in the previous modules to show you how to use R to interpret your data.
The objectives of this module are to provide you with an understanding of:
- How to create a basic plot in R
- Various types of plots in R
- The different graphical parameters of plots in R
- How to layer a plot in R
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed. Delegates should already be familiar with basic programming concepts such as variables, scope, and functions. Experience of another scripting language such as Python or Perl would be an advantage. Understanding mathematical concepts will be beneficial.
We welcome all feedback and suggestions - please contact us at email@example.com to let us know what you think.
[Instructor] We can create plots in R using the plot function. This is a generic function for plotting R objects and it opens a new graphical window or a graphical device whenever a plot is created. Say, for example, I had created two variables, X and Y, Y being double the value of X, and I'd like to plot that. I can simply run the plot command on these two vectors and we can see that a scatter plot has been created in the graphical device window of the plots tab in the bottom right pane. A plot function is very versatile. It can create many different visualisations.
By default, the plot accepts coordinates and draws a scatter plot as we saw earlier. The reason we use the keyword generic is to indicate that there are many methods that can be used in the background. So when an object is passed into R into the plot function, once that object is passed, determined based on the type of the object, we will determine which method is called. We can see a list of these methods by calling the method function. And we can see that we have various different types of inputs that can be pushed into the plot function. For example, I could pass in two vectors and I would see a scatter plot where we have the magnitude versus the index. And as you can see in the bottom right, a new plot has been created. I could pass in just one vector and see that we would see a different type of plot where the X axis now is represented by a random index ranging from, in this case, the number of points that we have. I can pass in a data frame into the plot function. And I can see that this has created a quick look at the numeric values in a data frame. What it's doing in the background is it's calling the plots.data.frame method. And implied inside of that is that it is calling and converting the matrix, converting iris into a matrix. Iris is a long data frame containing of many rows relating to many different types of flower data and it is saying, let me pick only the numeric columns and let me produce many different scatter plots and by doing so it is calling, in order to do so it is calling the pairs function and we can look at the help tab for that by calling on help and pairs and seeing that this will bring up the pairs function in the background.
Okay. I could also, if I wanted to, pass in a function. So now if I just clear out my history by clicking on the paintbrush in the plots tab and I click yes to remove that to clear out my history, I can run the plot function on a function. Here I'm using the Q norm function. Now what the plot function will do is supply a series of points that form an X axis and it will draw the function, as we can see, on the Y axis. Now what is the Q norm? Just as a hint if anyone didn't know what that is, it's a, the inverse cumulative distribution function and for anyone who wants to know what distribution we're talking about, it's usually called the quantile function for the normal distribution. Okay, I could also use the plot function to help me pass in, say for example, a formula. Let me clear my plots tab again using the paintbrush. Are you sure you want to clear all the plots in the history? Yes. I can create, update my X and Y to be different numbers now ranging from two to 30, sorry, two to 20 and three to 30. I can use the plot function as I would normally by default where I am passing in two arguments, one for the X and one for the Y and see a scatter plot created. I could repeat the same thing, and if I clear this out just to prove that I'm creating a new plot, I can create the same plot using just one argument into the plot function using the tilde notation. And here I have created the same plot in both functions but the plot required an input of either one argument or two arguments. I could create something a little bit more interesting by updating X to be a series of numbers of pi, ranging from zero to two pi. And I can try creating the sine plot. Now here I will use two arguments, one for X and one for Y where I no longer create or instantiate a Y variable, I'm merely just inputting sine X as my second argument. When I hit enter here, we'll see the sine curve being created. And I can recreate that, if I click clear on that and I can remove that from my plot history and I can recreate that using a single argument utilising the tilde notation.
And finally, I can also pass in a two column matrix into my plot function. So let me clear my history from my plots and let me create a two column matrix. So if I run my X now running from, my X runs from minus pi, two pi, to plus two pi. My Y will run from, Y is my sine of that function and if I scroll up and I show you what I created, I've created a combined version and if I click on my data for X and Y, here is my C bind of my two columns. So I'm inputting in something that looks like this, okay, so if I close this down I'm inputting in a matrix of two values and if I control L I clear the screen and I want to see what happens when I pass in two columns into a matrix. And then into a plot. So here is the function of plot and here is the two column matrix that I'd like to input. How will plot handle this? It interprets this as a pair of coordinates, a pair of vector coordinates and it understands implicitly that I have given it two columns in this. I have given it an X and a Y. So the plot function is quite versatile.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.