Fundamentals of R
The course is part of this learning path
This module looks at how to control data in R, through reading, writing and loading objects in R.
The objectives of this module are to provide you with an understanding of:
- How to bring in data from a file in R
- Saving and loading objects in R
- Interacting with the clipboard
- How to connect to files in R
- How to read from a file in R
- How to write to a file in R
Aimed at all who wish to learn the R programming language.
No prior knowledge of R is assumed
Delegates should already be familiar with basic programming concepts such as variables, scope and functions
Experience of another scripting language such as Python or Perl would be an advantage
Understanding mathematical concepts will be beneficial
We welcome all feedback and suggestions - please contact us at email@example.com to let us know what you think.
- [Narrator] We can read in files into R using the read.table or the read.csv function. In order to do so, let us create a file for example, table_KH.txt, I'll use some vectors to do so in the global environment, you can see what has been created. I will then collapse these into a single item over multiple lines. Each vector has been collapsed into a single string. I can then open and close and create a connection object and writeLines of this data into this connection. I can see the contents of the file that was created using the writeLines function to see the output. I can hit refresh in my file explorer to see that the file has been created. And as you can see the file now exists. Let us move now to bringing this information in to R using the read table function. The file that has been created looks like this. We can use the read table function as it has multiple inputs or multiple parameters. I'd like to take each one one by one and state that the first entry is file so we have to bring in the file name. We have the second input is header. We have to determine whether or not the file contains a header or not. In our case, the file does contain a header so we would like to tell the read table function this. How is the columns been separated? Using a comma. Are there any missing values in our data frame that we wanna bring in? There is one question mark as you can see next to the name Tom on the eight where my mouse is now in the global environment explorer, we can see a question mark. I would like R to recognize those as NAs. By default, the read.table function will bring in this text file every single piece of string that it sees as a factor. And I would like to ensure that this doesn't happen so I input in the final argument as being false to ensure that we suppress interpreting every every string column as a categorical variable. And when I hit enter, we have now brought in this variable, namely people. And if I type in people, I can see that it now exists. I can press enter to bring it onto the screen and I can see that age has been brought in as NA for Tom, even though in the dataset that we created, we brought, we created it with a question Mark. We can ask the simple question, has job which does look like a factor, author, architect, author, has this been brought in as a factor? No, none of my strings have been brought in as a factor, which makes sense for name but possibly doesn't make sense for job. So maybe to help future analysis we might consciously take an effort to update certain columns that require to be factors and as we can see there has been no change to the output look. However job now is a factor. In order to conclude this topic, I'd like to note that there is a read.csv function, which is a wrapper around the read table function which allows us to reduce the number of parameters going into this function by defaulting some of the arguments such as sep is always a comma and header is always true. So if I was now to bring in and create a CSV, I could use the function read under.csv with the three arguments file na.strings and stringsAsfactors as before. However, I don't need to state that the file is comma separated or that the file has a header. This is taken for granted, and as we can see when we run this, we see that if I run people now, I can see that the same file has been brought in in the same manner as before, whether I used the wrapper, an easier version of this using the read.csv. because I knew my file, my text file was a comma separated file, or using read.table function where I have more parameters to input and more control over what is bought-in.
About the Author
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.