Module 3 - Supervised Learning - Part Two
Hyper Parameters - Part 1
1h 52m

Supervised learning is a core part of machine learning, and something you’ll probably use quite a lot. This course is part two of the module on supervised learning. It takes a look at hyperparameters, distance functions, and similarity measures. We’ll wrap up the module with logistic regression, the method and workflow of machine learning and evaluation, and the train-test split.

Part one of supervised learning can be found here and introduces you to supervised learning and the nearest neighbors algorithm.

If you have any feedback relating to this course, please contact us at


So with k-nearest neighbors, we have our first algorithm which has what's known technically as a hyperparameter, which is the parameter K. So let's write that as a title "Hyperparameters." And here we have k nearest neighbors. And then the hyper parameter here, of course is k. So what is a hyper parameter? Well, let's just be clear what a parameter is first. So, parameter is that term in the model, which is set by the algorithm, which is provided by the algorithm. So, here is our model. 

So if we go back to a linear model, say for example, we have x and to do things simple, let's say we've got a for our slope b for our intercept, and then it would be a x plus b. And these things here will be parameters. And the machine learning approach here then for this to solve this problem is to imagine we have some algorithm could be linear regression, could be other sorts of regression, but this would be a regression problem. So, which takes in our data set, and basically just outputs a good value for a and b. And then we plug those back into our model when we need to predict. 

So, those are gonna to come down here. And then suppose we get for example three and four, then we would have three x plus four as our solution. We can even permit them at this point, if you want to do. And, you know, that would be the solution we could use those to predict. So, what about k? k you know, k is not a parameter. It's not something which fits into the model. It's a hyperparameter. It's something that fits into the possibly could consider it to fit into the algorithm or the approach. And the problem with the problem when something fits into the algorithm, or that whatever approach you're using with, you know, algorithm plus all the little bits around it. The problem when something fits into that, is that it's up to you to decide it. 

So, you choose the algorithm you choose the data set, you choose everything at this meta level. So this level here, that you know this size, you're chosen by you the practitioner. So, the model is what the machine gives you but you are choosing everything else. So, we could understand it as I suppose the algorithm having some parameters, which has, you know, x and y, and then maybe we even say that this algorithm, which would be the k nearest neighbors algorithm now, so if we say k and n, that has a choice of k, and then everything on this side is chosen by you or by the practitioner. Okay, so what are some issues surrounding that hyperparameter? 

Let's, try and think about that let's try and think through that. So yeah, maybe we say we have, these are all let's say nearest neighbor approaches, maybe we'll call them NNs. And then we have A one where there are where we've chosen three, four k, A one where we've chosen, let's choose four, let's choose five, what each of these are going to do is give you a different model that you learn. Now, what is the model in k-nearest neighbors? Well, in every of the model in every case, is the parameters you're learning in every case are really just the entire data set. So in every case, what you're not learning anything, really, you're just returning the entire data set again. So but the operation you perform on that data set is different, right? 

So you just remember the entire data set in the training step when you come to predict. All you're doing is you're just comparing with the training set. So phrasing it can be a little notationally tricky. So in Python, what we might say is just like, you know, the mode. So this is classification, we're taking the most common entry in our data set, which is most similar. So we could say mode of the similar, you know, Xs. So if this gives you the Ys, and it's a mode with three, you know, insight so that's a sketch. You look at the the Python demonstration for more detail there. In mathematics, I guess what we could say is that what are the algorithms doing? Well, they give it what is the model that they're producing anyway? 

Well, the model is so it can use the word mode. I think in the mathematics to see the mode and now look at considering a set of Ys. So it will be the set of wise where the unknown x is the same as unknown x. And then here, we could just say, put a three somewhere, you know, above there, so it's like where for the ones which are similar to three. So that's a bit of an abusive notation, but it's approximately helpful. So it's the Ys in our data set, where the unknown x is similar to three x or something. Right, and then what each of these models are doing is each of these algorithms are changing this choice of three. We are getting different models because the model includes that choice of three.


An Overview of Supervised Learning - Nearest Neighbours Algorithm - Nearest Neighbours Algorithm for Classification - How the K Nearest Neighbours Algorithm Works - Hyper Parameters - Part 2 - Hyper Parameters - Part 3 - Distance Functions and Similarity Measures - Logistic Regression - The Method and Workflow of Machine Learning, and Evaluation - The Train-Test Split

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.