Fundamentals of R
The course is part of this learning path
This module looks at how to control data in R, through reading, writing, and loading objects.
The objectives of this module are to provide you with an understanding of:
- How to bring in data from a file in R
- Saving and loading objects in R
- Interacting with the clipboard
- How to connect to files in R
- How to read from a file in R
- How to write to a file in R
Aimed at anyone who wishes to learn the R programming language.
No prior knowledge of R is assumed. You should already be familiar with basic programming concepts such as variables, scope, and functions. Experience of another scripting language such as Python or Perl would be an advantage. An understanding of mathematical concepts would be beneficial.
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org to let us know what you think.
We can write out to a file in R using the write.table function or the write.csv function. Prior to doing that, I would just like to bring in some data into R, I can create this using the table_KH file that I can show you here. From the File Explorer, I've opened up the file which is a comma separated file. And I can bring this in using the read.csv function to bring in the text file representing NAs in the text file as question mark. So anything that was in the text file that is a question mark will be brought in as an NA. I have actively chosen to ignore all strings and make them all just normal text, not factors. We can bring this up to the screen to see what we have brought in. So this resembles what was in the Notepad window before. However, NA has now replaced the question mark. Okay, now moving on to thinking about how we can write this out. From the RStudio session, I would like to write this into a text file called for example, people.txt. So, I will use the write.table function, which has many inputs, and I will take you through them one by one. The first will be the data that you want to input from R into the text file. Then you now need to specify the file name, here I'm calling it people.txt, which differs from the original file. In the File Explorer you can see table_KH.txt where my mouse is now or alternatively where we had read in the data from. I would then like to use a tab separation, rather than comma separation, so that's an active choice that I've decided to do. Rather than using the original comma separation, I would like to use the tab separator. I'd like to convert NAs back into question marks, so when it leaves the RStudio session and enters the people.txt file, I would like it to convert that question mark back into an NA. And, I think that column names are important and should be included, so I will set that to true. So that's one of my parameters that I'd like to go through for the right.table function. And as we can see by clicking in the File Explorer tab, and if I hit refresh, you can see that people.txt has now been created. I can show that up on the screen as to what has been created. I can also double check what has been created using a fairly simple function which calls on the connection between that file and it reads lines on that. And I would now like to output those lines out to the screen. So this is my way of checking that that file has been created correctly without exiting the RStudio session. Alternatively, if you feel more comfortable, you can look at Notepad. So either output is fine. There are a couple of things to note here; the columns have been shifted along. As you can see here, what I have highlighted is the number one and if I pull up the table_KH.txt file, I can see that one, two and three did not exist in the previous original version of this file. The row names have been shift included by default. So, the people data frame contains row indices, which have now been populated in the output underneath the name column, which is not what we want. So, in order to fix this, what we can do is we can write this out again. And if I use the right table again, it will just overwrite whatever I have created, so I can lift the same piece of code that I had before. Up here I'm mirroring exactly the same piece of information. But, I'm also going to include the fact that row names should be omitted, because I don't believe that the indices that R provides us need to exist in my people.txt file, they are merely there to help me interface with the R programming language. So I close that bracket. And now I have updated this people.txt file here. So I will use my same sense check that I had up here. And I can see that this has been amended and the indices no longer exist in the text output file of the file. I can also show you this in a Notepad window where the row indices have disappeared. Finally, I'd like to note that if you have a comma separated file, you can use write.csv, which is a wrapper around the right.table function. All it does is that it allows us to have a reduced number of parameters into the write.csv function, as opposed to the right.table function. And now you can see that we have default such as the comma separator, the separator argument has been defaulted to a comma as opposed to a tab in our previous choice. Here we've now, if I click in my File Explorer, and hit refresh, I can see that people.csv has been created. And I can again, use the simple check function to allow me to look at that file, reading them as lines and write those out to the screen here. And as you can see, we have a comma separated file that has been created.
Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.
Kunal has helped clients with early stage engagement and formed multi week training programme curriculum.
Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation.
Data Scientist at a credit management company; applied statistical analysis to distressed portfolios.
Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform.
Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data.
Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.