1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. 3. Beginner Data Structures in R

Missing data in R

Developed with
QA

Contents

keyboard_tab

The course is part of this learning path

Fundamentals of R
course-steps
11
certification
3
lab-steps
1
description
1
play-arrow
Start course
Overview
DifficultyIntermediate
Duration38m
Students68
Ratings
5/5
starstarstarstarstar

Description

Course Description 

This module introduces you to the some of the basic data structures that can be used in R   

Learning Objectives 

The objectives of this module are to provide you with an understanding of: 

  • What a vector is in R 
  • How to create a sequence  
  • How to create a vector using a repetition  
  • How to pull elements out of vectors  
  • Vectorised operations  
  • Logical comparisons  
  • Strings in R  
  • Undefined situations in mathematics  
  • 0, NA, NaN, & Null  

Intended Audience 

Aimed at all who wish to learn the R programming language. 

Pre-requisites 

No prior knowledge of R is assumed 

Delegates should already be familiar with basic programming concepts such as variables, scope and functions 

Experience of another scripting language such as Python or Perl would be an advantage 

Having an understanding of mathematical concepts will be beneficial 

Feedback 

We welcome all feedback and suggestions - please contact us at qa.elearningadmin@qa.com to let us know what you think. 

Transcript

- Imagine you have just conducted an experiment in the study you're in, in the classroom you're in, in your bedroom and you're trying to measure the temperature and you have a week's worth of data. And you store them in a vector known as temp. There are a few interesting readings that we have received here. One being an anomaly of perhaps 70 for Fahrenheit, 16 probably centigrade. Zero, which would be the zero value or the freezing point in, if we were to measure our temperature in Celsius. There are also three interesting readings for NA, NAN, and NULL, which we should take a look at. NA is usually something that is, indicates Not Available, where the data is not available. It's useful as a placeholder and it's usually an indicator of a missing value. NAN is where we have, Not a Number is the technical definition of what NAN stands for, and it would be in the case where perhaps the thermometer was broken. It is the expectation of a numerical calculation should result in a number. So, this is more of an indicator that we have an error. The last point I'd like to make is that NULL NULL represents Not Yet Calculated. It's a item that does not appear within the data structure. If I look at the output for temp, I don't see the value included in the output. In this case for our thermometer experiment it's usually an indication that something is not yet been properly initialized.

About the Author
Students434
Labs1
Courses11
Learning paths1

Kunal has worked with data for most of his career, ranging from diffusion markov chain processes to migrating reporting platforms.  

Kunal has helped clients with early stage engagement and formed multi week training programme curriculum. 

Kunal has a passion for statistics and data; he has delivered training relating to Hypothesis Testing, Exploring Data, Machine Learning Algorithms, and the Theory of Visualisation. 

Data Scientist at a credit management company; applied statistical analysis to distressed portfolios. 

Business Data Analyst at an investment bank; project to overhaul the legacy reporting and analytics platform. 

Statistician within the Government Statistical Service; quantitative analysis and publishing statistical findings of emerging levels of council tax data. 

Structured Credit Product Control at an investment bank; developing, maintaining, and deploying a PnL platform for the CVA Hedging trading desk.