Fundamentals of R
The course is part of this learning path
This module looks at functions, how to create functions, and how they can be used in R.
The objectives of this module are to provide you with an understanding of:
- How to define a function in R
- How to use built in functions in R
- What a return value is
- That functions can be stacked, and that they do not require an input
- That arguments can be named or
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed
Delegates should already be familiar with basic programming concepts such as variables, scope and functions
Experience of another scripting language such as Python or Perl would be an advantage
Understanding mathematical concepts will be beneficial
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org to let us know what you think.
- [Instructor] Within Base R, there are many built-in functions that we can use. Here, I'll use the mean function as an example. In order to understand the mean function, we can call on the help function to help us understand what the documentation for the mean function looks like. In the Help tab of the right pane, we see the R documentation being brought up as part of the RStudio interface for the mean function, which lives inside the Base R package, which stands for the arithmetic mean. We can understand that this function is a generic function for the trimmed arithmetic mean as described in the description. We could've also brought this up if I exit the Help back to Home, and I ask for the question mark of the mean. This will bring up the same result as calling the help function on the mean function. In both instances, we are pointing or redirecting to the R documentation website. To understand the arguments that form part of the mean function, we can use args. And this returns the fact that the function requires, at the minimum, the data x as an input, as shown by the usage in the help page. So, we can use, say, our example contained a vector of three items, we could bring that through as part of our mean function as our entry point and ask for the mean of two, three, four. We could double-check that the answer of three makes sense by typing in the calculation manually. Two plus three plus four all divided by three. We can understand the usage of the mean function by creating some data first, say, for example, the number of chocolates I've eaten in a week, ranging from Monday through till the weekend when I tend to eat more. And we can ask for the average number of chocolates eaten. So, we can use either named arguments, where I actually use the function named parameters x with the data defined by the vector named chocs, or I could've done this by matching the arguments by position. Here, we see that chocs has been defined as the first parameter of the mean function. I can also, if I wanted to, say, for example, I had another set of data called sweets. I could run the mean of the sum of chocs and sweets by inputting in chocs and sweets inside the first parameter. I could've done that by position, or I could've been explicit and named the first parameter as being x. The mean function can handle missing values. The argument na.rm, if I scroll down in the help, we have na.rm as being false by default. And if we keep rolling down, we can see the arguments as described here in the help window, and in the third item we can see is, a logical value indicating whether NA value should be stripped before the computation proceeds. All this is stating is that whether or not we should remove NA values from our calculation. Now, let's say I didn't have data for my weekend, and let's say that I wanted to know what is my average. Now, in the base version, if I ran the calculation as I had defined above where the na.rm is false, meaning that I choose to include these, I choose not to remove these missing values, the average of numerics and NAs returns with an NA. That's perhaps not what we want. We'd like to know what is the average without these items, so we will set the NA remove to true, and that returns 2.8, which is the average of, and if I add these three numbers up together, these five numbers up, sorry, and if I divide these five numbers by the number of numbers that we have, we see that by removing the NAs, we are cutting the vector here, so that it only has the five items that are being summed, and then averaged over five. So, you've understood in here that we can utilize the help in the function definitions within the R documentation page for many built-in functions. In this case, the mean function was defined inside the base package. Here, we saw how the parameters were defined and what the arguments actually meant. And you can continue reading the help page to understand more about, specifically, the mean function, or any other functions you might like to explore inside the Help tab.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.