This module introduces you to the some of the basic data structures that can be used in R.
The objectives of this module are to provide you with an understanding of:
- What a vector is in R.
- How to create a sequence.
- How to create a vector using a repetition.
- How to pull elements out of vectors.
- Vectorised operations.
- Logical comparisons.
- Strings in R.
- Undefined situations in mathematics.
- 0, NA, NaN, and Null.
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed.
Delegates should already be familiar with basic programming concepts such as variables, scope, and functions.
Experience of another scripting language such as Python or Perl would be an advantage.
Having an understanding of mathematical concepts will be beneficial.
We welcome all feedback and suggestions - please contact us at email@example.com to let us know what you think.
Conditional statements in R can either be accurate, n calls true, or they can be inaccurate and known as false. We can abbreviate the true to T, or we can abbreviate the false to F, for false. In order to understand logical operators, I'd like to define a variable. For example, age as being 25. I'd like to compare this age with the number 21, and this will return a vector of the same length as age. Here the length is one, as shown in the global environment explorer in the top right pane. This statement here will return false, corresponding to the number of elements in age where the condition is not met. We can compare the magnitude of multiple objects, of any two objects, such as one is less than two. Another operator might be one is less than or equal to two. Another operator would be the three is greater than four, this would probably return a false. We can also use the greater than or equal symbol.
If we wanted to know about when things are exactly equal, we can use the double equal sign. And if we wanna compare with the logical operator for inequality, we can use the negation symbol, to allow us to understand whether three is not equal to four. We can compare logical expressions. Let's say for example, I assign the fact that the age is greater than 21 to the variable is_adult. I can now show you in the global environment explorer that is_adult is stored as true, or I can show you on the console window. We can assign Booleans as you would any value. So I can type in the name of the variable, use the assignment operator and then type in the choice of the Boolean that I would like to input. Can verify that this has been stored by looking in the console, or looking in the global environment explorer. I can repeat the same logic for a different variable known as is_happy, and then I can use the assignment operator and I can say is false. Then I can show you that again in the console, to prove that that's what I've stored. Now, we can compare these in, using either the intersection, meaning the AND operation. So we're verifying whether we are open and an adult, and in this case, we have stored adult as being true and open as being true so both of them will be true, so the result of this will be true.
We could use a union across these two logical Booleans and we could have the OR statement for is_happy being false and is_adult being true. So we have a logical comparison which will return true, because either the first entry or the second entry is true. If I wanted to have sad, for example is not happy, I could have is_happy as my variable that represents false and I can use the negation argument on that to bring up true onto the screen. Logical vectors can be coerced into numeric values or numeric vectors. False becomes zero and true becomes one. So this formula here has meaning numerically, where we have true to represent the number one, multiply by three, so that would equal three plus zero, totaling to three. We could have true plus one, which would be one plus one, which would be two. We could have false times two, which would be zero times two and that would return zero.
We can compare logical expressions utilising R's vectorised framework. So if I was to just title that there and put an example and show you that with a vector of trues against a vector of falses, comparing them with the ampersand symbol, we can return a result for every variable because we are using a vectorised comparison. So this means for every element in the first vector, compare it to every element in the second vector. So we'll pick the second true and we'll use the ampersand logical operator and compare it to the false, and the answer for this will be in both instances of our two items in our vectors, they both return true, or in this case, false, true and false is false, because we need both of them to be true. In my next example, which I was rushing ahead to, where I have the OR statement, either of the vectors is true and either of the items is true, and hence we see that the answer comes out as true for both instances. The following returns a statement to the screen that may not make sense. This will return... If you were to do this by pen and paper, this might not make sense. However, R has the ability to give you a warning and return an output, and it's indicating to you that the longer object, meaning three, is not a multiple of the shorter object, which is two. So here we have a bit of vector cycling, recycling, but it's not correct and it's telling you that we have an issue where R is doing something smarter than you and so you should be aware of, when you're using these logical operators, how to compare vectors of different sizes and make sure that they are multiples of each other.
For example, if I was to continue talking about recycled elements. Here, I have my team that are sneezing, so I have a few logical Boolean entries for each person and feverish, I only get data for half of the team but I know that they're all based in pairs. So I can repeat, person one connects to person four, person two connects to person five, person three connects to person six. So this vector here might be useful to be recycled as I compare sneezing with feverish to work out who has both. And so using the ampersand symbol, I'm asking for the intersection of the two, and I see that we have true and true, because the first entry is true, and false with false, which means that the last entry will be false. I can repeat the same thing with the union or the OR statement and that will tell me that in all instances, either I have true or I have true, and in the last entry I have false or false, which makes my last entry false.
As an aside, I'd like to just highlight the difference between the single ampersand symbol here and what we had in the previous page or the previous screen, which was the non-vectorised statement. So in this instance, let me lift this from here down to here then I can show you that if I was to add in a double ampersand symbol, we would see a different output because we are only, in this instance, using the non-vectorised version of the logical comparison and we are using, we are only comparing the first element. So it compares the first true with the first false and it says, "Do we have an intersection?" The answer is no. Booleans can filter a vector to if I had a prices vector here, so I've created prices. And if I was to create a Boolean vector to use as my index, if I was to use this as my prices and if I was to put that underneath that and put a square bracket there, I can see that I have created a Boolean vector of the same length as the size of prices, and I'm asking to return those items that are true. So I'm using T, or true to mean keep, so I would like to keep the first, the third and the fourth entry of prices. I could repeat the same thing and create a Boolean vector using a logical operator such as greater than, and apply that to the prices vector and ask for only the last two to be returned. So I return the last two entries, which are 14 and 15.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.