Module 3 - Supervised Learning - Part One
An Overview of Supervised Learning

Supervised learning is a core part of machine learning, and something you’ll probably use quite a lot. This course is part one of the module on supervised learning, in which you’ll learn about exactly what supervised learning is and what its capable of, before moving into more detail on the nearest neighbors algorithm.

Part two of supervised learning can be found here and covers hyperparameters, distance functions, similarity measures, logistic regression, the method and workflow of machine learning and evaluation, and the train-test split.

If you have any feedback relating to this course, please contact us


Let's now tackle the topic of Supervised Machine Learning as a subject in itself. What we're going to do here, is look at lots of different algorithms that solve the prediction problem. We're going to look at the workflow around the prediction problem, what steps do we take to get there, and then some issues surrounding model quality, model selection, and so on. So we'll take it in pieces. Let's just review Supervised Learning before we begin. 

What is Supervised Learning? Well, it's where you have, you've got your target that you want something you want to predict, something you know your features. See what your target, your features, we've got a relationship between them, which allows us to make an estimate for the, given those features. And now critically, we're aware now hopefully, relationship that we use this fhat, the model depends on some parameters and we'll just summarize those parameters with w here, could be one or more, but here, just w. 

And what else is it to say here? Of course, yes, there's a loss. So we need to understand also how good or bad our predictions will be, maybe give alpha, loss, the loss, will compare the prediction to the target. And we can give, we can spell that in full, loss is where we take in the weights of the model, and we take in also the data set, but we hold the dataset fixed when computing it coz you're saying how good is the model, so we hold it there as a fixed and we vary the model in order to do that. Right, so let's just get terminal terminology for this. We've got the model, the loss as a target features. We may also group the dataset into a D, perhaps, Dataset for training, and the training data set would be just in all of the x's. 

So, row one. So the first example that I should say, so, which we use as a subscript, so one, two, three, however many x's there are, along with all of the y's. Or we could just summarize that as, a capital X for many examples, and y, and that would be our data set. Now there's the two principal kinds of Supervised Learning. So that's the general setup. There's two principal kinds, which comes down to the kind of target we're dealing with, if y is a real number, then we have regression. If y is, just, let's say plus one minus one, we have binary classification. And if y can be one of multiple options, you know, whatever those options are, we could have had negative numbers or something, let's say zero, one, two to n. 

That would be a multiclass. So, multiclass classification. Now, ah right. Okay, good. So we've looked at one algorithm to solve a Supervised Learning problem of arriving at an F and the algorithm with linear regression. We've got algorithms, and modeling. Algorithms and approaches we'll say. So linear regression is where we choose a straight line for our model, our model will be a straight line. And we set the weights by updating them using the derivative of the loss, the slope of the loss. And where we choose the weights using the loss as a guide. So we improve the model by getting it to fit better to the data by computing the loss which tells us how wrong we are overall. 

And so we change the model until that loss is minimized till you have minimal loss. So update weights, until minimum loss, minimum total loss . So that's one approach we can take. In general, we'll have a situation where there's lots of different algorithms, lots of different approaches or algorithms. So we can give that a script A and what we'll do is we'll feed data in to A, so we can feed in all of the x's and all of the y's and out of A, will come a model. That's a general you know, regardless of number of inputs, it will give you some model. And then we take this model, and we use it to predict the, whatever we need, the estimate. So we use it to give us y, yhat. So, there's kind of two times here's isn't it? 

There's this sort of time here, which we can call training. And that's when we input our dataset into the algorithm. And then there's a second time. And so the result of training is to give us a model. And then the second time here is where we input the, the unknown as the the x, the features for the unknown person or example. And that gives us the prediction for their age, rating, height, age, price, profit, whatever it's going to be. So into the algorithm goes the data and out comes a model. Into the model goes the, data for the unknown future case so known features, unknown y and outcomes a guess for y, and this is what I may call prediction time or deployment. 

Deployment means we're putting into practice the system that we have made. So, this is maybe train time. So when we look at the Python for this, you know, if we were to look at some Python for this approach, we would see two defs, def train, which would take in our dataset and return some function and we'd see def predict, which would return lemme call this fhat, which in general, we could take it, we could take fhat and we would take an unknown x one x or an unknown y, anywhere where we don't know what they are all about. And, you know, what we do there is we return a prediction form. So there's two kind of steps here. This is like, you can call this capital A for the algorithm, that's gonna give you your fhat, and here for the prediction, we take in our fhat, and we return our yhat. So this is like fhat to yhat. So the two steps, there to get to the model. 

One, and step two. So before going through some other algorithms, what I'd like to do is just build up this picture, this workflow picture just a little bit. And we'll keep building up the workflow as we go through, and we can see how the whole approach gets formed. I think for now what I'm going to do, is introduce the idea of a testing phase into this process as well. So we've got, step one, we give an algorithm our data set. And out of this comes our model. Now, what we mean by model is mostly just the weights because, let's say if we have a linear model, we've got ax plus b. The only kind of things to find out there are the a and b. So mostly when we say model, even though it does technically mean the full formula, the full mathematical formula, this here is the model. 

But what the output of an algorithm isn't a formula, it's just a series of weights. So if we look at the Python here, def train or def a, just gonna be returning, you know an answer for a and b, so whatever the a and the b that they calculated was. Bits and dots there as to what the actual approaches, then we return a and b. So really the output is the weights in terms of the computing bit, but we can think of conceptually as a returning formula as well if you want to. So that's the first step. So that's what we call it training. So here's a question or here's a problem with just going straight to prediction, is, we haven't tried lots of different algorithms yet. 

We don't know how, we could have done lots of different approaches. Why don't we try? You know if this is algorithm one, you know, maybe like algorithm two, approach two or algorithm two, approach three, approach four, they're all gonna give us different models. f1, f2, f3. Is it possible to do better than we have, using a different approach? This second step here, this second step we call model selection. And so we're gonna try lots of different approaches. So we can try the first approach and see how well we do. And then let's try lots of different approaches and see how well they do. So we're gonna do with each of these, is evaluate them Second models, how do we do that? 

We just try lots of ones but then we eval this, eval that, eval that, you wanna select the best. So just say this comes out to be the best. And then in step three, we got evaluate the best. Evaluate the best. And then, so we have a measure of accuracy. So how accuracy is, how accurate is it? And step four, is deploy. There's a few extra bits to this process now, right? So there's this sort of model selection, that we have lots of different approaches, then we have to evaluate the model we've selected, and then we deploy it. So I think the point of making this sketch here right now, is to give you a sense before we go into different kinds of approaches, but how they're gonna fit into the process. We're gonna have to take each of these pieces in turn and do a little bit of a session on them each individually, so we'll come back to the methodology and the workflow as a whole subject. 

Come back to Model Selection the subject, come back to Evaluation the subject, the idea of testing and training and all these different kinds of times of the project stages of the project. But for now I just wanted to give you a sense of the picture that's forming in terms of the workflow. and then to motivate the idea, actually, maybe we want to try lots of different approaches here. And so let's consider some of what some of those might be. So, what we're gonna do, we're gonna have a look at some approaches in regression. And some approaches in classification. The first approach, kind of we're gonna look at is, applies to using both cases, as soon as the nearest neighbors, nearest neighbors that's what the NN stands for. 

We've already seen linear regression over here, linear regression. We will also look at logistic regression. And we'll keep building, we'll keep building as the course goes through. So that's just a little preview of some approaches, what we're gonna do now is consider nearest neighbors first, then we'll look at logistic regression second, and then we'll keep building, so in this session, we'll look at both of these and then we'll come back to this this workflow and evaluation point and see where we are.


Nearest Neighbours Algorithm - Nearest Neighbours Algorithm for Classification - How the K Nearest Neighbours Algorithm Works - Hyper Parameters - Part 1 - Hyper Parameters - Part 2 - Hyper Parameters - Part 3 - Distance Functions and Similarity Measures - Logistic Regression - The Method and Workflow of Machine Learning, and Evaluation - The Train-Test Split


About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.