1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. 5. Advanced Data Structures in R

Factors in a Data Frame in R

Developed with
QA

Contents

keyboard_tab

The course is part of this learning path

Fundamentals of R
course-steps
11
certification
3
lab-steps
1
description
1
play-arrow
Start course
Overview
DifficultyIntermediate
Duration36m
Students25
Ratings
5/5
starstarstarstarstar

Description

Course Description 

This module looks at more complex data structures, building on what was covered in the intermediate data structures module.  

Learning Objectives 

The objectives of this module are to provide you with an understanding of: 

  • How to construct a factor in R  
  • How to construct a data frame in R  
  • How to modify a data frame  
  • How to subset a data frame  
  • Data frames automatically factorising data in R  

Intended Audience 

Aimed at all who wish to learn the R programming language. 

Pre-requisites 

No prior knowledge of R is assumed 

Delegates should already be familiar with basic programming concepts such as variables, scope and functions 

Experience of another scripting language such as Python or Perl would be an advantage 

Understanding mathematical concepts will be beneficial 

 Feedback 

We welcome all feedback and suggestions - please contact us at qa.elearningadmin@qa.com to let us know what you think. 

Transcript

- [Instructor] When creating a data frame, we automatically factorize characters into factors. For example, by default, we use the parameter factors as strings. If I was to create a data frame as such and call on my different string columns, I can see that without having asked for them to be factors they are factors by definition because by default the argument inside of the data frame constructor is stringsAsFactors are set to true. We automatically factorize character data. Let us compare this to a list. Rather than me creating a data frame, I'm now creating a list, and then I can ask for the data to be pulled back onto the screen, and as we can see, we just have a series of characters and what type of information have we stored? Not a factor, but we have just stored a character vector. What about data that are not meant to be factors? Say, for example, names, if I was to ask for people, the names here, Sherlock and Watson, shouldn't be factors because that doesn't make sense as far as our data is concerned. They should be stored as strings. Now if I was to utilize the strings as factors parameter of my data frame constructor, I can assign the false argument and force all strings to no longer be stored as factors. We don't automatically factorize now. So now if I pulled names to the screen, we can see that we have treated this correctly as a character vector. However, some columns which should be factors such as cities, are not stored as factors so how can add this back? We can use, in the same way we had defined our data frame earlier, but we can update this for using the factor constructor around whichever variables require or need to be factorized, and so I can pull on the screen my two classes of my two columns that I have created and I can show that the names are stored appropriately as characters and the cities are stored appropriately as a factor.

About the Author
Students363
Labs1
Courses11
Learning paths1

Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.  

Kunal has helped clients with early stage engagement and formed multi week training programme curriculum. 

Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation. 

Data Scientist at a credit management company; applied statistical analysis to distressed portfolios. 

Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform. 

Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data. 

Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk. 

Covered Topics