Hyper Parameters - Part 3
Start course
Difficulty
Beginner
Duration
1h 52m
Students
852
Ratings
3/5
Description

Supervised learning is a core part of machine learning, and something you’ll probably use quite a lot. This course is part two of the module on supervised learning. It takes a look at hyperparameters, distance functions, and similarity measures. We’ll wrap up the module with logistic regression, the method and workflow of machine learning and evaluation, and the train-test split.

Part one of supervised learning can be found here and introduces you to supervised learning and the nearest neighbors algorithm.

Transcript

In outer unit we were, suppose it's set up with the university, and there's this university area. So what are we seeing? We're seeing 25 to let's say, 35. So we're seeing in this region... Actually, so, oh, this is days. So, okay, we can make them days as well. So, age is on the vertical and let's say 18, 18. Right, so let's say 25 to 35 days from purchase if we wanted to. And if these, maybe the reason 18 And if these, maybe the reason 18 is the minimum policy there, 'cause you can't claim before 18 days.

So, it can make the thing have the same scale. Right, so, okay. So, within 25 to 35 days, and let's say that this region here is, what's that region in terms of age? Let's say, "Well this is 18" Let's make this... Let's make this two, let's make this.. At the same scale it's actually lower down. So this is halfway beginning thirteen. This is not quite halfway, there's halfway. So there's thirty, so it would be maybe 25 to 28, say. So there's thirty, so it would be maybe 25 to 28, say.

So, within this age range... Within these number of days, we see some fraud. Ah, we see some fraud. Now, let's think about, is the reason we're seeing fraud in this data set likely to hold up in the future? If it's likely to hold up, you know, if the cause of this phenomenon is, you know, not incidental, you know, of this phenomenon is, you know, not incidental, you know, it's going to be there when we come to predict the fraud. It will be good to capture this irregularity. So, this, not irregularity necessarily, but this extra irregular boundary, but this extra irregular boundary, which is very, which kinda captures a very specific phenomenon. So you could think of this, this very smooth, red boundary as capturing the general cause of fraud, or general features that typically relate to fraud. or general features that typically relate to fraud. And if not necessarily the cause, but here we're maybe capturing some other interesting pattern.

So, why could that be? So, 25 to 35 days from purchase, and 25 to 28. So we need to do a little investigation here. So, these are gonna be maybe graduate students, these could be graduate students. And why would graduate students... So why would this whole region here, why would there be no fraud down here? Well, there's no fraud down here because we don't, because, undergraduate students don't wait that long. So, they don't wait this long. They don't wait 25 to 35 days. So there's no fraud down there. So we know why there's no fraud there. Why would graduate students... Uh, so graduate students... You know they're committing fraud here, say. Some not so much as they get older. So we got this patch of fraud... And then we've got here this outreach, which is maybe about... So the question is why does this green region exist as well?

So why the undergr- Why are there... So undergraduates don't wait that long. So we know why that grab of patch there. But graduates students do wait that long. That's fine, so maybe, okay. So that's a plausible explanation. So maybe, maybe, Maybe as people get older... As people get a little bit older They're, um... You know, uh, cleverer, smarter. You know, uh, cleverer, smarter. And so they go well and'll probably actually wait before sending in my claim on my policy. And so there's people who don't wait over here and there's more people who do wait. So maybe in fact this region here should be red. And this region down here should be green. And this region here should be green because the undergraduates don't wait that long. And this region should be red, because post-graduates sometimes do wait that long. Alright, so that could be a plausible explanation as to what this... where this is coming from.

And therefore maybe a lower k here is right. And you know, conversely it's a little easier to think about why we would wanna drop this region here. The reason we might wanna drop that region is just because the... In our data set there happens to have been let's say five, six people who committed fraud here for reasons that are unlikely to hold up in the future. Such as, you know, maybe... We clack it up a specific time of the year. And it just so happens that that time of the year, we get this phenomenon, but when we come to predict, it won't be that time of the year, it'll be any time of the year, so we shouldn't capture that kind of pattern. You know, any reason to do with variables that we aren't capturing is random. So days and age here are the variables we're capturing. So if there's some cause of, or some connection to a variable which isn't days and age, it might be that we shouldn't include it because the pattern caused by that variable is not something we can see.

So, whether it's there or not there, we wouldn't know a prediction time. So, it might be... It might, it might, you know... When we come to predict something this phenomenon might not exist, if it is quite genuine anyway. So, so, okay... So, higher "K" would kind of dispense with these possibly very local patterns, very specific patterns, or patterns that are unlikely to emerge in the future or replay in the future. And so a higher "K" would kind of be insensitive to that. Right, so the topic here that we're considering... I'm a consider this more detail later. So over here it's called "regularization". Regularization... And, that just means making the boundary more regular. And it has to do with fitting, and over-fitting has to do with error and many statistical concerns that we'll think through separately. But what I wanna highlight here is just this nature of this hyper-parameter here.

So, okay, in the case of "K" nearest neighbors, we have a hyper-parameter "K" and how are we going to set that? Well, there's several possible motivations Well, there's several possible motivations you might have in setting this. But you can see that they are highly problem-specific. So we have to go through and think, "Well what is the solution for 3? What's the solution for 5 or 4 or 21? What do we see in the actual output in the predicted output?" And, "Is that right?" Do we think this is going to hold up or not? So, setting hyper-parameters, and in fact any choice on this side of the problem, any choice of algorithm, any choice of approach that leads to a model that is really set by the problem. I think that's the takeaway here. So you can think of this... You know, that's where data happens right? So this is the art of machine learning So this is the art of machine learning is to do that correctly or well, anyway. Can't do it correctly, but to do it well is the whole art of the process. So this is, you know, this is... The problem is done when you have that. So, um, right, so K nearest neighbors gives us some insight into those kinds of concerns.