Repetitively Apply Actions on Vectors in R
Fundamentals of R
The course is part of this learning path
This module looks at conditional statements in R, such as for loops and how to repeat functions.
The objectives of this module are to provide you with an understanding of:
- When to use a for loop in R
- How to nest a for loop
- Built-in functions being vectorized
- How to apply functions
- How to use the family of apply functions
Aimed at anyone who wishes to learn the R programming language.
No prior knowledge of R is assumed. You should already be familiar with basic programming concepts such as variables, scope, and functions. Experience of another scripting language such as Python or Perl would be an advantage. An understanding of mathematical concepts would be beneficial.
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org to let us know what you think.
If I would like to loop over a vector, say for example the vector ranging from one to 10, I could use the apply function. However, this would return an error, as the dimensions of the input X need to be greater than two. It expects an array of at least two dimensions. If I do have a vector, then I can use the apply family of functions, including the lapply, sapply or vapply function. Let's start with understanding what the lapply function is all about. Now I can utilize the help window, turning our eyes to the right pane, and seeing in the help tab, that lapply is part of the base or, and we can see that the lapply inputs a very similar to the apply inputs. And that we have a function which takes an input of X, which can be a vector or a list. We have a function that will be applied and it returns a list. Each element of the list is the result of having applied the function to the corresponding element of X. As a side note, the dot dot dot can be used, if we have a function that requires additional parameters. Here, I might try to use my lapply function on my vector with the sum function. However, the output might not be as expected, because the lapply function is applied to each element of my vector one at a time. To show a more simplified use of the lapply function, let's create a vector. This factor here is the vector of outbreaks and yes, and I can convert, use the outbreaks factor as my input and as my function, ask for the number of characters. And this will be applied to each element one by one, and a list is returned. In order to utilize my outbreaks vector with a bit more clarity, I can split out the outbreaks and remove the semi-colon or colon sorry. As a side note, if I'd like to see this on the screen, I could ask for that as being returned, but what might be better is, to say ask for the structure of this, when you're dealing with their list to see less information on the screen. I can use the lapply function to create the lowercase version of the outbreaks, utilizing this information as my input, the outbreak split. And using the function to lower. I can see what I have created by looking at the structure of the lowercase outbreaks. And as you can see, everything has been lowercased. Leaving our eyes to the top of the screen, I'm going to create a user defined function, to grab the first part of the X input to this function, which I will use on my outbreaks vector. I can create the first part which would be using this function here called select first on the outbreaks lowercase, and thus grabbing the names into one new factor or list, which would be the output from the lapply function. I can do the same for the second part, and here I just show how to use the lapply function without naming the parameters. And we can see what we have created by running and looking at the two outputs. And here we have a list for the names, and here we have a list for the years. We can now turn our eyes to the help window to understand what's the sapply function is made up of. And we can see that it is a user friendly version of the lapply function we just learned about, which returns a simplified object where possible. So the reason to note this is that we have a, if the data is simplifiable, then a vector or array will be returned. For example with our vector and the sum function, but now if I choose to use the sapply function, I can see a lot simpler outputs, because it was possible. I can repeat the same for the outbreaks and the number of characters and show you that sapply will return a tidy simple Vector. Another example of using the sapply function, would be creating multiple sequences based on each element of this factor as an input. Now each output is of a different length. Each sequence is of a different length. Therefore, when I run this, it's not simplifiale into a vector. Here, we see a list data structure being returned as this supports columns of different lengths. Moving onto the last part of this section, we can call the help for the vapply and cursor, authorized down to the help tab, and see that we have a similar function to the sapply. But what we have here is a specification to what our output should be like. So, it's a more robust version of the sapply that we just saw. Let's say for example, after having run, after having looked at a function, I'd like to see one number having been returned. So for each item in the input, I'd like to see a single value returned. So this is a tidier or simpler version of the lapply function with a specification of what output you would require. I can show a more complicated version of this, where we have our vector, and we'd like to run this anonymous function. And we'd like to erroneously tell the our system that we expect a template of numeric one to be returned. What we see is that the anonymous function is expecting to be, to return more than one numeric value. While the template is forcing us to expect only one numeric value. The length is three. So, the way to fix this, is to note what the error is telling us, and return this. Now, this may not be so meaningful because we're using a vector as an input and it's returning, this function applied to each element. But you get the picture of what the additional argument here does. And this essentially gives you robustness over the sapply function. Where is it useful? This is useful when say for example, you expected only one result, in the case here, and you believe to expect only one results. However, the anonymous function that you have defined has more than one result, which is a surprise. So, you had to be aware and the error helped explain that to you. It's a useful error return. So, the failure is useful, in the case where your expectations differ from your template or sorry, your expectations differ from your function outputs. And finally the question might arise in your minds as to when I should use the three functions. Sapply is probably the simplest to use. It's a user friendly version of the lapply function. I would use the lapply where I do not want my results to be simplified to a vector. So where I'm happy to receive a list, and I would, in an ideal world, use the vapply function where I know that I have a type of result that I want to specify or that I'm expecting.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.