Supervised learning problems - Part 1
Start course
1h 23m

Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part one of the module on machine learning. It starts with the basics, introducing you to AI and its history. We’ll discuss the ethics of it, and talk about examples of currently existing AI. We’ll cover data, statistics and variables, before moving onto notation and supervised learning.

Part two of this two-part series can be found here, and covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth.

If you have any feedback relating to this course, please contact us at


Right, so let's look at some supervised learning problems. Firstly within this field, recall, we have Xs. Our Y is our target and we usually only have one. So, now we have two, but usually one. X is our features. Features. That's the key. We need our words right here, don't we? Good. Features as you know, Y you can't predict. Now, when I use a Y without a hat, it means the historical stuff that we already know. So this is a bit weird, isn't it, saying well, there are things we know that we won't know at some point. So let's just disentangle that just a bit. In our historical data set we will have a column. You could call it Y and we'll have the column, you could call it X. But in the future, when we collect our data and look at our observations, we won't have Y. No, we won't know what you will rate the film. Before you've rated it, but we can look at historical ratings you've given and predict what your rating will be. So there's two notions of time in these problems, isn't it? In supervised learning there's two notions of time. Possibly in machine learning in general there might be a couple notions, but in certain supervised learning there's two notions of time. There's the time of training, the algorithm when you consider the historical data, the training data set and there's a time of prediction. Or some people call this maybe deployment. Some people might say it's testing time. There's different words here. But let's just call it prediction time. There's time when you're doing prediction and the prediction time you only know X. And the whole point of course is to be able to compute your guess for Y. So, you know, so you would have Y at that prediction time. You wouldn't know what the actual, true result would be Y. Right, so that's the sort of set up for supervised learning. And of course there's this F which relates the two together which allows us to determine a guess from what we know. Let's give you a couple of examples of different kinds of machine learning problem. Different color again, let's go for the first one which will be a regression problem. Now regression is defined by the thing we are trying to predict which is Y being a real number. So this is a bit of notation here. And we will probably have a bit more detailed look at notation in this course as well as some other courses you can use as resources if you struggle with mathematics or this a little. But for now, let me just describe what this notation here means. So that little E symbol you can read as saying "in". You go well, what does in mean? That capital R symbol stands in for all of the ordinary, real numbers. So the R means real and the real numbers are a little bit odd actually, but we'll not go into that too much. But what that means is that it's just sort of, it's gonna be minus a hundred point five or it's gonna be fifty three point two or some measure, basically, some measure. Some measurement. So a real number is a kind of number that you might sort of imagine like a thermometer would give you or some kind of measure. So a regression problem is one where the thing you're trying to predict is intuitively some kind of measurement of something. Now profit is an example. You know, film rating could be an example. Film rating out of ten. You might think well, if five point three is a prediction, two point one is a prediction. That would be a real number. To contrast that, what other kind of numbers are there, well there are discrete numbers. So there's regression. We'll come onto classification later in one, in just a moment. But in this example, the contrast here is the thing we're trying to predict is going to be in the set, for example, minus one plus one. So above in regression, the thing we're interested in, maybe this could be profit. It could be a real, genuine measure of the world like temperature or heart rate. Temperature, heart rate. It could even be a film rating. As long as we're happy that the rating's going to be five point three, one point one or something out of 10. Whereas below, well this could be film rating as well, but this is a different style of rating where essentially we have like and dislike or something. And the way that we are converting those two terms into something computable, something numeric is by saying like is minus one, dislike is plus one. So there's actually quite different kinds of problems. So it's a very simple distinction at the beginning. You know, is it sort of like a measurement or is it just an option kind of thing. Actually really changes the analysis quite a lot. Right, so in regression we have a real Y. And in classification we have a binary, well it doesn't have to be binary, but some kind of option or label or some discrete number. Right.

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.