Course Description
This module will introduce you to the R programming language and the RStudio Integrated Development Environment. You’ll also look at some useful tools available in RStudio.
Learning Objectives
The objectives of this module are to provide you with an understanding of:
- How to download and install the R programming language
- How to download and install the RStudio IDE
- The different panes in RStudio
- How plots are formed in RStudio
- How to add comments in RStudio
- Useful keyboard shortcuts in RStudio
Intended Audience
Aimed at all who wish to learn the R programming language.
Pre-requisites
No prior knowledge of R is assumed.
Delegates should already be familiar with basic programming concepts such as variables, scope, and functions.
Experience of another scripting language such as Python or Perl would be an advantage.
Having an understanding of mathematical concepts will be beneficial.
Feedback
We welcome all feedback and suggestions - please contact us at qa.elearningadmin@qa.com to let us know what you think.
In order to view a dataset in RStudio, we will utilise a dataset about happiness records, which is fictitious. This dataset has been stored in a .RData file and we will be using the source file on this in the source window right now. Both files have been stored in a folder known as fundamentals of R. I would like to set my working directory in my console to this working directory. I can do this by clicking on the session menu and setting the working directory to the source file location. I can then load in the happiness .RData file by running the command load happiness .RData, utilising the shortcut Control Enter to push the command from the source window to the console window. This command then instructs R to fetch or load some data, the happiness counts into the global environment. As you can see we now have a dataset called happiness. We have 82 observations and we have three variables. As you interact with R you will see, you will create many objects and you will load them in as part of either performing analysis or as a byproduct of any computation you may conduct. Can I view this happiness data file that's been loaded into my workspace? Well, in my console window, which I'll drag up, I can type in the keyword happiness, which is the name of the dataset. I can press, Enter or Tab to complete this. And once I press Enter here, I can view the entire dataset by scrolling up and down using the scroll bar on the right. It can take me from row 0, row 1, index 1 all the way down to row 82.
What are the four columns that I can see on the screen at the moment? If I scroll up to the top and talk you through each column, each row contains an index or row number. The second entry is the year, the third entry is the number of boys that were, in this case happy. The fourth entry is the number of girls, that in this case were happy. Are the first column containing the numbers 1 through till 82 part of the dataset? The answer is no. R prints them out to help us with making visual comparisons by row. You can think of them as the index that you might see on the left side of a spreadsheet. How can I view the dimensions of the data that I've brought in? I can type in dim, for dimensions. Selecting and running via the shortcut Control Enter pushes them into the console window, and as you can see, I can see 82 and 3. 82 rows and thirty, three columns.
Can I see the names of the columns without bringing up the whole of the dataset? I can use the command or the function names. I can select row so it selects line 10, press Control Enter and that will push that out into the console. And I can see that the names of each of my different columns in my dataframe are years, boys, girls. I can also use our RStudio's built in data viewer by clicking on the happiness dataset. Now, if I press... If I just show you where my mouse is. it's in the environment and I'm clicking on the happiness dataset. This brings up an alternative display of the entire dataset in a separate window as part of the upper left pane, as part of the data viewer framework. It shows, tells me how many it's showing, how many rows are shown. It tells me the full number of entries that are populated in my dataset and I can close this by clicking on the X at the top of the dataframe or the view in the data viewer. The automated command to bring up this was via the view command. So we can take note of that there.
Can I access just one column of my dataframe only? So I can type in happiness and if I use the dollar notation, I can bring up any of the columns that I'd like to see. So if I'd see... If I wanted to bring up just the boys who are happy, then I can type in "happiness$boys". This will print, this outputs the number of boys happy each year. Now, if I wanted to do this for the girls, I could do the same thing again. But instead of typing in the keyword boys, I type in the keyword girls. Notice the restricted data printed output differs from the original dataset. We no longer have a structured table. We no longer have columns listed one after the other. We now have a row vector where each of the left entries provides the index for the number of girls that were happy in the case of what I have selected on the screen now. And if I scroll up, I can show you that the first entry for the 1930s, for the number of boys that were happy was 5218. And if I click on the dataset, in the global environment, just to bring it up on the screen, we can see 5218 as being the first entry in that column.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.