Practical Machine Learning
The course is part of this learning path
Supervised learning is a core part of machine learning, and something you’ll probably use quite a lot. In this module, you’ll learn more about exactly what it is and what its capable of, before we move into more detail about the nearest neighbours algorithm, hyper parameters, and distance functions and similarity measures. We’ll end the module off with logistic regression, the method and workflow of machine learning and evaluation, and the train-test split.
- Let's talk about the nearest neighbors algorithm then. So this is a very simple algorithm, one that can be used for both classifying points and for predicting, real measures for them, giving a regression value for them, a trend value. Very, very simple algorithm one of the simplest possible. So, let's have a look at it. So, the approach is called the nearest neighbors algorithm and is it's very, very simple. And one of the things I'd like you to be thinking about as we go through this, is maybe just thinking about what, we've used linear regression, this is where we draw straight down into the , let's get about how nearest neighbors compares to linear regression. And, maybe that can give us some insight into how machine learning problems can be tackled from lots of different directions. So it may seem all we're doing is draw a line and predict something. Well, there are other approaches, there are other approaches and here's one, right, okay so. So this is very recent approach. Here's the intuition, What I do is, I say, okay, I may not know your target, I may not know how much money I will make from you. I may not know your grade, I may not know how you were ready for me. But what I'm gonna do is I'm gonna keep my database of all the people I've seen, my training data set, I'm gonna keep that around. And I'm just gonna look up in my training data set, the person who is most similar to you, look at what their rating, what I think it was, what they gave or how much money I made from them or what their grade was. And I'm just gonna predict that. Very simple, so all I do is I find the most similar person in my database, of training data, and I use their, rating. So let's just look at that so, how do we wanna? Well we can visualize this process. Maybe let's do a look, let's look at its computation maybe first rather than visually or geometrically, so computationally, what do we have? Well, we have kind of like a table, right? A database, or training data, where we have, let's just go for one feature we have one type and so on, x and y, this is how historical that data set for training. And what do we have? Well x was, let's say, this is someone's age, and y was how much money we made from them. So for their age, let's go for 18 years old, 30 years old, 45 years old dot, dot, dot, y will be how much money we'd make from this particular person will make 10 pounds from this person, let's say 23 pounds from that person, and made 50 pounds from that person. So maybe we're a supermarket. Maybe we're a factory or a warehouse. Some kind of electrical goods store, whatever it may be, we're, in the retail area, and we're looking at agent profit, maybe okay, or revenue. So what we're going to do, when when someone comes along, and we don't know what their, profitability score is, or how much money we're gonna make from them. What situation do we have there well we have, let's have an x. So we now have a known feature, we don't have a y, we need now to have a guess for the y. So, maybe let's do this stuff in red, maybe. So, this is the stuff, we've got in the past. This is the history, we just store that. And now what we're doing, we're prediction of the deployment. Of the deployment in this side and we're actually using the system or predict using it to predict stuff. What do we do? Well, we take in a feature which in this case is age let's take some data and say 19 years old, and maybe we have a person who comes along and they're 23, a person who comes along and they're 50 dot, dot, dot, Those are different people. Now what do we predict for a 19 year old? Well, we go to our database and we say, the most similar person, the person who's closest in age is the 18 year old. So we predict that we will make 10 pounds. Likewise, 23 year old the closest person now 23 year olds is an interesting case, right? Because 23 year old is what five away here and seven away there. So maybe we say it's still closest to the 18 year old, right? So that's still 10 pounds, and 50 year old, that's gonna be closest to 45 so, we're gonna predict 50 pounds. Very, very simple, and hope so extremely dumb, right? In many ways. We're doing machine learning, so we're not trying to solve scientific problems, particularly, usually, we're doing some kind of statistical estimation of what's going on. So, how good is this? Well, we can't say in general how good it is, but probably it isn't as good as linear regression. If the problem can be solved with the linear regression, if it's a nice straight line problem, maybe linear regression would do better, maybe, because you know why. Well, here we're only considering one person, right? So when we predict for 19 year old, all we consider is one person in our database, whereas as a little side point here, so let's do it in blue, kind of side point. If we compare that to a linear regression, when we draw the line, we don't just drawing a line, For one data point, we're not just saying, "Well, here's a line for this data point or for two data points." So if I only considered two data points, that would be my line, what's a terrible line? But when in linear regression, when I draw a model actually consider it for the total loss, I minimize the total loss, not the loss for every point, the total that goes through the whole dataset that tries to go through the whole data set as best they can. So in some sense, the information that I get from this guy over here, is as important as the information I get from that girl over there. And both of those points are like little magnets of forces that drive my selection of the best line. So, all of the data here is contributing to coming up with a good solution. Because I'm considering all that data, I'm doing the total loss, when I move my line up and down. Whereas here, I'm only considering one point, okay. Is there a situation in which this could be better? Let me try and think about that for you. So, in this situation here, it's very likely that a linear regression would be the better option, because actually there's a very strong positive correlation here. And the straight line, does a very good job of capturing that correlation. What about another situation? What about a situation where maybe young people have a certain kind of behavior, middle aged people has a very different kind of behavior, and then older people have a different kind. So it's very, so maybe for young people, we lose money for middle aged people, we sort of tend to make money. And then for the older sort of person, perhaps we tend to have a very different thing. And, of course, maybe there's some scattering around here. But It's possible, so in other words, that if you're this person here, and I don't know what your, profitability will be, what I will predict for you, if I just consider the person closest, let's just zoom in even to the closest person here. In my database, is this person here I'd say, that's not a bad prediction. So if this is, for a five year old, or maybe let's go to 15 year old, and then my, he spent 10 pounds in my shop, then maybe, for the 15 year old, I will just predict again 10 pounds. So in other words, there is this sort of relationship here, which is negative, actually, for young people. And then for the sort of more middle aged people, maybe it's actually quite strongly positive. And actually, if I'm over here, and behind 45, and I put a point here, then maybe the closest point, which could be say this one, maybe that one's a good prediction. So ignore the green, the green lines are just showing what the linear well, they're just showing what was happening with the layers at some part of the model. And again, for the older sort of person, maybe 80. If I asked, if there's an 18 year old there, if it's actually who you are referring to find the closest person could be over here, I'd say. Now, what would happen if I drew a straight line through all of this, if I tried to solve this with linear regression let's do it in black for linear regression. You get that as maple or something else, maybe this is my, my solution. I mean, that might not be so bad. What would I predict? I would over predict in the region of young so for a person down here, the actual answer should be over here, so I'm way off. I mean, for the middle age group, I do very well. So I'm always correct. And for the older group, well, they're sort of flat. So I'm under and over a bit. So it might be that if we have quite distinct regions of our feature, space or our input, then this neighboring approach might work.
About the Author
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.