Fundamentals of R
The course is part of this learning path
This module looks at more operators, and introduces conditional statements in R.
The objectives of this module are to provide you with an understanding of:
- How to compare the values of two expressions
- How to compare the values of two Boolean expressions
- How to compare values of vectors
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed.
Delegates should already be familiar with basic programming concepts such as variables, scope, and functions.
Experience of another scripting language such as Python or Perl would be an advantage.
Understanding mathematical concepts will be beneficial.
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org to let us know what you think.
- [Instructor] Operators in R come in two forms, firstly, relational, and secondly, logical. First, let's start by looking at the relational operators in R. They are essentially binary operators which are used to compare numeric values. The list of them includes equality, inequality, the greater than, and the less than symbol. I can demonstrate this by looking at examples of each. So say for example does 8 equal 8? Yes it does and we return the statement true. We can ask is 7 greater than 3? The answer is obviously true. We can use the inequality is not equal to, 5 does not equal 4, this should return true. To utilise the same thing again, we can say does 5 is that in an equality to 4? the answer is false. We can also use strings, such as that.
We can compare Booleans or logicals using TRUEs and FALSEs. We can ask the question of 2 is less than or equal to 9? We can use many strings or many characters that form a string, such as the apple comparison to banana. The reason for this being true is that "a" is less than "b" and the reason for that is that R gives preference to alphabetical order. Given that TRUE is numerically interpreted as 1, and FALSE is numerically interpreted as 0, TRUE will always be larger than FALSE. We can create some data in order to help us understand how to work with vectors. Say, for example, I had a class and I had 200 students and I was one of them, my result was 89 whilst the other 199 students had the following results. I have manually typed them in, one after the other and stored them in a vector. I then concatenate the two of these by using the c function and I can say that my whole class has these results here. So there are 200 results here and one of which is mine. I can double check that I've created my data correctly by asking for the length. I can show you what I have created have I created a list, the answer is no. Have I created a vector, the answer is yes.
Now after having created this data, I would like to use some of the relational operators to help me understand what I have in my data. Now is my grade equal to 76? I can use the equality operator to compare my result, being 89 at the top of the screen, comparing that to 76 as being the question mark. I can ask is my result, using the inequality operator, not equal to 76, the answer is true. I can ask which grades in my class are above, say for example, a pass mark of 75 by using the greater than symbol. And here I am now using the vector of my class of 200 students so I will receive back a Boolean vector of TRUEs and FALSEs. In order to find out how many of these passes were in existence, I can count up the number of TRUEs such that one, two, three, four, five, skipping the FALSEs, six, seven, eight, all the way through the 200 students, or I can remember that TRUE has a numerical interpretation of 1, and ask for the sum of the preceding line, which is 98. So there are 98 TRUEs in this logical statement here, utilising the relational operator greater than. In order to find out how many or what proportion of these grades were below the pass mark, I could use the sum function again to find out there were 102 who had failed, and then divide that by the length or the total number of students in the class, 200. I could've also calculated this in one succinct line using the mean function. But this would've meant that I would've have to understood that proportion and mean are symmetrical in this instance.
Finally, we can turn our hands to the second type of operators. The logical operators. And here is the list of what they are use, are represented as in R. So the OR operator is a vertical line, the the NOT operator is an exclamation mark. And we can ask how well I did. So we can say, was my result greater than the pass mark and less than say, for example, 90%. We can ask how well or who was actually quite far from the pass mark, who were the extreme entry points in my class, who were the ones that need help, the ones that did less than the pass mark minus 10, who did greater than the pass mark plus 10, then I can see that we have a series of TRUEs and FALSEs which are useful if I would like to know on a student-by-student basis but let's say I would've preferred to understand what proportion of my class was far from the pass mark. I could use the sum of this to see that there were 80 of these students who were outside of the near to the pass mark range and we can say that as a proportion we have 0.4. And I can succinctly say this in one line that using the mean function, so I could say that 40% of students were far from the pass mark, which would mean that 60% or the majority out of those two numbers was close to the pass mark.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.