Big Data and AI | SDL4 A3.1 |
Data is information and, just as there are lots of different types of information, there are different types of data. In these videos, you'll learn more about types of data and the ways in which you can store them.
When you're ready, click 'next step' to continue.
So when talking about data and how it comes in terms of variables, it can often be a little bit helpful to understand the varieties of variables or varieties of data that you might have, so let's talk about the kinds of variable or data types almost, mathematically speaking. So there are kinds of numbers which we call real numbers. And these are quantities or can be understood as quantities or measurements. To have a real value variable, then, or a variable of a real type is to have a ordinary numerical sort of variable, that behaves like an ordinary, numerical sort of measure.
I'll give you some examples. So, here, x would be a real number and some examples of real numbers would be, you know, an age having a value of 18.5. Or a profit. Maybe we say x age and x profit of £1,000.51, if you'd like. Or a temperature of 23.5 degrees Celsius. So these are the, sort of, numbers that can be partial so in the sense that they can be-, you know, they can be-, fractions are real numbers. And they will have, sort of, decimal points and things of that kind. Now in contrast to real numbers, you can have discrete numbers or categorical variables so here with a real number, we consider this to have a continued to be a quantity or a measurement, sometimes called a continuous variable. So the variable itself will be called continuous and the reason for that is that suppose I had a line with some measurements on. I suppose I could measure from eighteen years old to nineteen years old. So it's a variable that has a continuous domain or a set of numbers that it could possibly be. So it could possibly be anywhere between eighteen and nineteen.
Let's talk about qualitative variables, or variables which are about characteristics, and here we would call these not necessarily continuous but discrete. So a discrete variable and a discrete number is the value of a discrete variable. It can sometimes be called a categorical variable. Categorical because we interpret the numbers as categories. So here we could have x which could be a location, say, and here it could be Leeds or London or Manchester, and we could assign a number to each of these potential categories. You know, zero, one, two, and it isn't the case that 0.5 means anything. You can't be both in Leeds and in London at the same time. So there isn't such a meaning to partials or intermediaries between points. You're either a zero or you're a one or you're a two. That means it's categorical. You're either this or you're that or you're that. Now within the categorical section, we might make two, sort of, specialisations. We could say that a variable could be nominal, which is like this, which is cannot be ranked, meaning that there is no sense to zero being better or higher or lower or anything like that other than one. It is just a different symbol, so a nominal variable or a nominal categorical variable is just a symbol. Zero, one, two. An ordinal, however, if you can tell by 'ord', is rankable. It's orderable. Can be ranked, here meaning ordered.
So by the rating of a film for example, you know, you can either rate a film, let's say, one star or two stars or three stars, or something like that. That is the case that if you rate it three stars. Let's say a Michelin-starred restaurant or a film. This is better than one star. And it turns out that techniques that may be relevant to solving some problem may only be possible with data that's ordered or data that is unorderable, let's say. So questions about the kinds of variable you have, the kinds of data you have, are important ones for establishing which techniques will become relevant when you're trying to solve problems within machine learning or within AI or within statistics.