Fundamentals of R
The course is part of this learning path
This module looks at more complex data structures, building on what was covered in the beginner data structures module.
The objectives of this module are to provide you with an understanding of:
- Different data types
- How to coerce elements, and force coercion
- How to construct a matrix
- How to construct an array
- How to construct a list
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed
Delegates should already be familiar with basic programming concepts such as variables, scope and functions
Experience of another scripting language such as Python or Perl would be an advantage
Understanding mathematical concepts will be beneficial
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org to let us know what you think.
- [Instructor] How can we create lists in R? We can start off by thinking about coercion. In order to coerce into a list, I'd like to create a vector. So for example x, which goes from one to 10. I can ask for the length of this and see that the length is 10. I can ask for the class of this to understand what I have created. I can coerce this into a list using the as.list function. and now, if I ask for the class of this, I can see that the class is a list and I can ask for the length of this to note that it has not changed. In order to understand the structure of a list, I can use our usual index notation for element access, but if I was to say, pull out any of these elements, each one of them is in its own right a list. So in order to actually access any of the members, or the elements, I need to use double square brackets to access any of the actually elements. And what is the class of the actual elements? That's still an integer. So the underlying data has not changed. It's just a case of how we access this information now. I can create a list using the list constructor, and a list can contain any piece of information with a name and any data type, in this case character. The name is anything and in this case a vector. It can have any name, and in this case, I'm using a piece of a data frame. As I print this to the screen, I'll see a new line for each element, and here we have a gap between each element. The name associated with each of our different pieces of information in our list. I can ask for the class of this and know that it is a list. I can ask for the length of this and see that the number of elements in the list are three. I can access inside my list using index access. And again, a list is recursive, so the class of any part of the list is a list in itself. For example, if I wanted to know the length of this, I'd see a length of one because at this stage here I haven't entered into the elements completely. Elements are accessed by double brackets, and so if I ask for the length of this now with my double square brackets I'll see an answer of 10. What is the class of the second entry in my List_KH? It is an integer. I can manipulate this as I choose, so I can use double index entry at the right of the bottom of my screen here with the one. If I'm accessing the first element, being the name, which is a character string, then I can add on anything else I'd like, utilizing any other function as I would a normal variable. A very useful function to remember is the internal structure or diagnostic function, which allows us to see the structure of the list, S-T-R. It tells me that I have a list of three items. It names each one, one after the other, and it gives me some insight into what each datatype is, and into what the data structures are, and into what each of the different underlying datatypes are. If I was to try and overwrite an element, it's as simple as assigning a new item to. So this is my new item, which is a matrix, which I would like to overwrite into the third entry of my list, which at the moment looks like this. It contains a random name and a list, or a data frame, in the third entry there. And now at this stage here, if I go and run this, I have now updated my List_KH to no longer contain the data. At this stage now, I'd like to just talk about indexing operators and the fact that they can be stacked. Here we have list member access. I'm inside the matrix now. If I add in a one at the end, I'm using vector indexing. And if I add a comma to that statement there, I'm asking for the first column. In the same sense that at this stage here I wanted to know how to overwrite an element in a list, how do I remove an entry from a list completely? I can remove an entry from a list by assigning the value NULL, and that has removed it. As we saw in our list, entries have names which can be anything. Can we access the names of these list entries, and access the actual underlying data? Now, if we think back to what we would with a vector, imagine he had arrays and different places stored and I could just access using the names inside of my vectors. Let me create similar data, but noting that in a list we can have mixed data types. And I'd like to extract just one element. I can do this in a few different ways. I can use the literal name. I can use the string name. Or if I would of liked to know, if I knew that Helen was in the first part of this list, I could use the numbering way of accessing this element. Can I access multiple elements? In the same way that I can access the first element, I can access multiple elements with a vector.
About the Author
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.