Variables & Data
Start course
1h 23m

Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part one of the module on machine learning. It starts with the basics, introducing you to AI and its history. We’ll discuss the ethics of it, and talk about examples of currently existing AI. We’ll cover data, statistics and variables, before moving onto notation and supervised learning.

Part two of this two-part series can be found here, and covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth.

If you have any feedback relating to this course, please contact us at


So when talking about data, and how it comes in terms of variables, it will often be a little bit helpful to understand the varieties of variables or varieties of data that you might have. So, let's talk about the kinds of variable or data types almost, mathematically speaking. So there are kinds of numbers which we call real numbers, and these are quantities, or can be understood as quantities or measurements. And to have a real value variable then or a variable of a real type, is to have a ordinary numerical sort of variable, which behaves like an ordinary numerical sort of measurement. I'll give you some examples. So, here, x would be a real number. And some examples of real numbers would be, an age having a value of 18.5, or a profit, and maybe we say x age and x profit of £1000.51 if you'd like. Or a temperature of 23.5 degrees Celsius. So these are the sort of numbers that can be partial, so in the sense that they can be, fractions are real numbers, and they will have decimal points and things of that kind. Now in contrast to real numbers, you can have discrete numbers or categorical variables. So, here with a real number, we consider this to have a continuing to be a quantity or a measurement, sometimes called a continuous variable. So the variable itself will be called continuous, and the reason for that is that, suppose I had a line with some measurements on, suppose I could measure from 18 years old to 19 years old. So that's a variable that has a continuous domain or a certain number that it could possibly be. So it could possible be anywhere between 18 and 19. Let's talk about qualitative variables, are variables which are about characteristics, and here we would call these not necessarily continuous but discrete. So a discrete variable or a discrete number, is the value of a discrete variable, could sometimes be called a categorical variable. Categorical, because we interpret the numbers as categories. So here, we could have x which could be a location, say, and here it could be Leeds or London or Manchester. We could assign a number to each of these potential categories, zero, one, two. And it isn't the case that 0.5 means anything, you can't be both in Leeds and in London at the same time. So there isn't such a meaning to partials or intermediaries between points. You're either a zero or you're a one or you're a two, that means you're categorical, you're either this or you're that or you're that. Now within the categorical section, we might make two sort of specializations, we could say, that a variable could be nominal, which is like this, which is, cannot be ranked. Meaning that there is no sense to zero being better or higher or lower or anything like that than one, it is just a different symbol. So a nominal variable or a nominal categorical variable is just a symbol. Zero, one, two. An ordinal however, if you could talk about odd, is rankable, it's orderable. Can be ranked here meaning ordered. So, by the rating of the film for example, you can either rate a film let's say one star or two stars or three stars or something like that, and it is the case and if you rate it three stars, let's say it's for a restaurant or a film, this is better than one star. And it turns out that techniques that may be relevant to solving some problem, may only be possible with data that's ordered, or data that is unorderable, let's say. So questions about the kinds of variable you have, the kinds of data you have, are important once we're establishing which techniques will become relevant when you are trying to solve problems within machine learning or with an AI or within just that.

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.