Fundamentals of R
The course is part of this learning path
This module looks at more complex data structures, building on what was covered in the intermediate data structures module.
The objectives of this module are to provide you with an understanding of:
- How to construct a factor in R
- How to construct a data frame in R
- How to modify a data frame
- How to subset a data frame
- Data frames automatically factorising data in R
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed
Delegates should already be familiar with basic programming concepts such as variables, scope and functions
Experience of another scripting language such as Python or Perl would be an advantage
Understanding mathematical concepts will be beneficial
We welcome all feedback and suggestions - please contact us at email@example.com to let us know what you think.
- [Narrator] In order to modify a data frame, let us create our data frame that we had. Here I've defined super is being a data frame, which looks like a table on our screen. I'd like to add rows about me. So I can create a data frame. Regarding say, for example, the same columns that I have defined as per my data frame above. So I would maybe have the age, I can have name, say my name being Kunal, I can have my location being, say, for example, London. And I can close the brackets there. So that's now me having created a data frame about me. I can use the rbind function, to create a new data structure, which will append, the data about me to be super. And as you can see at the bottom here in row number four, we have added in this information regarding me. To note, if I clear the screen, in a second ,super has not changed. This is still the original super so if I was to clear the screen, and actually update super with the rbind of super to me, rbind being the abbreviated version of row bind. I can now see that super has changed or has been updated. Rbind is useful because it can create a vector. Say for example, I was to create a series of measurements of my emotions. And on each day when I measured this emotion, I would weigh myself. Then I create a series of observations, let's pull them up onto the screen. And as we can see, when I'm happy, I weigh 89. When I'm sad, I weigh less. When I'm angry, it's in the middle. This is made up data. But this is just to show you what a data frame could look like, and the benefits of the rbind function, in order to create a vector. Now let's say I wanted to add in a new observation to my observations, I would then use the rbind function as I had done with the super rbind of me into the super data frame. Here I now have the rbind function, with the existing observations data frame, being assigned to the observations data frame, so I'm going to overwrite update or change this. Now I can create a one row matrix. We're using the rbind function, such that I know that when I am feeling silly, my weight is equal to one two three point four for example, again, this is dummy data. And then I have to close the brackets off so I'm closing off the rbind. And once I press Enter here, I have utilized the observation. I've utilized the rbind function to update the observation data frame. And added in the fifth row at the bottom here, I have created inside of here a one row, if I take that and it puts that down here, I have created a one row matrix from a Vector. That is the answer to the question, is this useful for a data frame? Data frames are usually vectors. So the cbind function, helps and is very flexible. Say for example, I had a vector called rating. And I would like to pull super backup onto the screen. And I would like to cbind, column bind this data onto my data frame directly. I could then type in cbind, here, open brackets so I'm using this function. I going to cbind my super with my rating. And as you can see, I have combined this column to the existing data frame. As a side note, before we finish we could have assigned the new field and update the existing super using the dollar notation and created this column, in one go and then updated it for the rating vector here. So if I press Enter now, pulled out super, super has now been updated for this final column here.
About the Author
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.