Supervised learning problems - Part 3
Start course
1h 23m

Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part one of the module on machine learning. It starts with the basics, introducing you to AI and its history. We’ll discuss the ethics of it, and talk about examples of currently existing AI. We’ll cover data, statistics and variables, before moving onto notation and supervised learning.

Part two of this two-part series can be found here, and covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth.

If you have any feedback relating to this course, please contact us at


- Okay, let me give you a classification problem. So we do it again. So classification, this is where, if you remember, our Y just has some options. It isn't any sort of ordinary measurement, it's just got some options. Now if it's a binary classification problem, binary classification, that means that Y has two options, which we usually give it the numbers by convention, could be anything, by convention, our plus one and minus one allows us to say that there's a negative option and there's a positive option. And that's quite useful sometimes, you know. Some examples we've seen already. Does a person like a film or dislike a film? Positive, negative. Should I go to the shop or not? Positive, negative, you know. Is the tumor malignant or benign? Positive, negative. Many things, in fact, many problems can be understood as binary classification problems. Sometimes, we'll need a multi class problem and that's where the Y, the thing we're trying to predict, remember, can have lots of different options. Zero, one, two, three and these could represent different cities, you know. You might go where should I go on the train? Leeds, London, Manchester, Liverpool, whatever, lots of different options, not just yes, no, positive, negative, left, right. So that's a classification. But let's look at binary classification to be nice and simple. Now with the binary one here, what we're going to do, is we're gonna let's go for like dislike on a film. So let's say our Y is like, or dislike, so it's gonna be plus one and minus one. And let's have a couple of Xs this time. So we're gonna have two different features. We're gonna have someone's age and the length of a film. We're gonna visualize this setup and maybe show you how it could be solved. So you've got two axis, X1, X1. Now notice now, that the axis itself is not Y. Why is that? Well, that's because, a line, visually speaking a line is describing a measurement, basically. What it's saying is that, you know, you could have any age, you can have age between, so X1 is age, so you can have an age from between zero and 100. And you can be 18.1 and 18.2. So, visually a line is this sense of, well, you know, you could have a measurement of any position and here we only have two options. You can't have an axis in that way, can't be because, there is no 0.5, it doesn't make any sense to have three, for like or something. It's either you like it or you dislike it, okay? So, with age, you've got zero to 100, let's say with film length, you got zero again, all the way now to film move, let's say, if you do it in hours, maybe five hours or 10 hours. I don't do it longer. So, let's say it's a five hour film, course if this is gonna be 2.5 hours. Most films could be less than that. Right, okay, how do we put Y on this visual then, how do we put Y, the Y is going to be color. We're gonna have two colors and the color is gonna be, whether you like it or dislike it. So let's give an example, so if I have a film of 2.5 hours, and let's say my age is 50, I'm 50 years old, and the film was 2.5 hours long, let's say I like the film, and so I'm gonna put big green point on there. And then green is gonna mean I liked it. Let's go for red. Okay, let's say there was a film in our database, that was 2.5 hours long, and I was not 50, now let's say, 10 years old or something. So 2.5 hours in X2, but in X1 very low, 10 years old, to be in red, meaning that this person didn't like it. So we look at our database here, look at our database at the end of the data sets we're dealing with, the green one in our database is gonna be a row, and it's gonna be 2.5. maybe, well, if you're doing it right, if you're doing the order that I've used here. Age would come first, which would be here for green, 50, X2 would be length of the film which was 2.5 hours and then in the database, well then the color, obviously, would just be a number, let's say was B1. And then for the this red point here, we would have 10 years old, 2.5 minus one. And that's we're just visualizing that third column with a color. Okay, let's just put on more data. So, you know, green points, some red points. And, you know, there's gonna be films over here as well probably, right, but maybe not many. So there's some green points and there's some red points. Now, again, I use some machine learning. I mean, pay attention here. This is a machine learning method... all right? It's just finding this relationship F, this relationship F, what's the relationship here? What does that what does this F telling you? It's telling you what kind of or what level of age and film length distinguishes people who like those kind of films from people who don't like them. So what this line here says is, you know, everyone above this line, or most people above this line, they liked the films. And most people below this line they didn't like the film, right? So what is this line allowing you to do? Again, it allows you to bring something that's visual. It says, well, in the future, into the future in blue here, in the future, if I see a film and it's three hours long, and I know that you are 50 years old, that you're 50 and the film is three hours long, I'm going to predict, because you're, this is above the line, I'm going to predict that you're going to like the film. So my prediction is that you will like the film. Okay, one more, if the film is, let's say, one hour long, and I know that you're 10 years old? Well, no, I predict you don't like it. But if the film were, oh this is a very aggressive line, but this is telling us that, you know, for ordinary ranges most people didn't like our films. So maybe, maybe that's to do with that data set, that we don't have good films in our data set. Let me draw another one. Maybe we learn this line. What that line would tell us is actually, you know, in the second case, if we have a different relationship before the end of two, well we, that that thing is more people like things than we thought so people above this line, you know, people who three hours and maybe 20 years old, on the second account would like it, on the first account wouldn't. And which line we have there is there dependent on the data that we have, and the particular techniques that we use to solve the problem, right? So, let's just do a quick review. So, where are we here? We've talked about supervised learning. So, in supervised learning, because this training step, where we learn the relationship, the red line, where we learn this black line here, and where we learn F, that's the training step, using training data, these points. Then there's this prediction phase, where we go okay, well now, now we use the line. So, in here I draw the green point and draw this little 27 up to the red line that tells what my prediction will be I, you know, you're 27 degrees outside, you make 70 pounds. So, there's two steps, train step, find the relationship, prediction step, use the relationship. Okay, so that's supervised learning. Then two kind of core problems in supervised learning if 90% of supervised learning class is regression and classification. Regression, are we predicting some ordinary sort of measurement number style of number? Classification, is this an options for our number that we're predicting. In both cases we can understand this as kind of just finding a line that separates or describes, you know, this line is the patents and how in the data and then we use this pen to tell us or give us a prediction in the future.

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.