1. Home
2. Training Library
3. Machine Learning
4. Courses
5. Module 0 - What is Machine Learning? - Part Two

# Finding the Model with Linear Regression - Part 1

## Contents

###### What is Machine Learning? - Part Two
1
Unsupervised Learning
PREVIEW12m 50s

## The course is part of these learning paths

Practical Machine Learning
11
6
3
AWS Machine Learning – Specialty Certification Preparation
39
14
15
Start course
Overview
Difficulty
Beginner
Duration
1h 30m
Students
276
Ratings
3.9/5
Description

Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part two of the module on machine learning. It covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth.

Part one of this two-part series can be found here, and covers the history and ethics of AI, data, statistics and variables, notation, and supervised learning.

If you have any feedback relating to this course, please contact us at support@cloudacademy.com.

Transcript

So, recapping that setup then for supervised machine learning. Recall that we have some variables of interest. Now, we've got the thing we know and the thing we want to predict. And then out there in the world. And then there's this sort of real relationship which we call f. And what we're trying to do is find some way of coming up with an estimate, whatever the estimate would be. Here we probably wouldn't use a straight line to estimate it maybe, but we could just say look up, there's our estimate and it does all right to some degree. But what we're trying to do is maybe come up with some relationship. f hat that in some sense is kind of close or approximates the real relationship here. Okay. So, how do we do that? How do we do that? So, in this video we're gonna talk about how we're gonna find f hat. Later on we'll talk about the issues that are going on in the background here that make it bit difficult. Let's just take a simple case though. So, in a simple case you're gonna consider linear regression. So, we're gonna consider linear regression. So, what does that mean? Well, remember that regression is a form of supervised machine learning. So, it's a kind that we're talking about, right. So, it's where we have y. And it's where y is a real number. So, we'll recall it here we can say this is in the set of real numbers. We can read that that way, right? In set real numbers. Or just is is a real number. So in programming this tends to mean that the type of y. So, let's suppose I'm in Python and I ask Python what the type of y is? It would tell me that it's a floating point number. So that would come out to be a float. So, that's a way of sort of corresponding some of the mathematics here with the computer sciencey programmey bit. We'll do more about a little later when we got the key ideas out. And that's a way of understanding what that notation there means. Now, so regression winds up a real number. Linear means a straight line. So the relationship we're gonna find here is gonna be a straight line relationship. So, how do we do that? How do we do that? Well, the first thing to point out is we don't actually have the real function so we can't just say here's the truth. Let's just try lots of straight lines until we hit kind of the truth. That's not gonna work because we don't know what the truth is. So, when I said earlier that goal of machine learning, supervised machine learning was to find the best approximation to the truth. We don't know that. What do we know? Well, we're in a really bad position really. I mean compared to where we might be. We actually only have a few points. So, we call these points a sample and you can think of these points here, this historical data set. This historical data set has hopefully a sample form of the truth. So, it's kind of pulled out. So, the notation we might use here if you're interested is, if this is x and this is y, well, we might hope that our x and our y are in f. That's reasonable kind of notation. Approximate sort of notation. And you can think of this as saying well, this is how things really are then we hope that our data sets in that way. We probably can be a little bit more formal in our annotation at some point. That will do for now. What do we do then? Okay, so we just have these data sets that we hope for it as a sample that pulled from or draw in from the truth in some sort of way. The reality not there. What do we do now? Well, we're tryin to find a way of fitting some line to this truth. Red line! There you go. A question, so that's the solution. But now the point of this video is to explain how we get to that solution. So, we don't have the true line. We have some points. And we don't know how far we really are from the truth. It's not even clear that there is a way of measuring that that is in some sense gonna give us the right answer or time even if we had the real function. So we need something else here. We need another notion loss. So, let's just tell you what loss means. So loss is or what we call the empirical loss. Or just loss, empirical loss, is just some measure that we choose basically. Some formula that we choose that tells us how far our points are on our red line from what we see our y. So these little blue points here. The y is the height of this little blue point here. That our prediction is the point in the line is y hat. And this distance here is the loss. Loss doesn't have to be understood as a distance, but in linear regression that's the best way of understanding it. Just in distance. In an ordinary linear regression there's just a very standard formula for the loss. So, the standard formula for the loss is known as a mean square error. This mean to explain a little bit about that. But let's just talk about error, then square, and then mean in that order. So let's talk about it in that way. So, we define this loss. That tells us how far out our predictions are on this red line away from the truth in the blue points. Then what we do is we play around with our red line until we've minimized the loss. So the goal here becomes sort of minimize the loss you get when you compare your predictions with your observations. That's about it actually. So just that's the sort of new interpretation of the goal. So before we had minimize, I'll put this as a little side point here because I don't want to confuse you. But, before we had a little minimize, prediction minus the known truth of everything and do that. And then take the absolute distance between the two. Well, we can't do that. We don't know the truth. So what we do is we have this intermediate sort of formula. You might think of an error or the general sense in error called a loss. And that's not gonna use this real function f. You don't know what it is. It's just gonna compare two things. It's gonna compare the estimate. How do we compute the estimate? You'll recall the estimate comes from our little function we learned in red there. So our estimate is just gonna be some function of x. And what we've put in essentially for the f is we just plug in a data point y. So, another way of writing the same formula is minimize the loss. So we compute our estimates using estimate function. That's kind of the same formula and you can see here that this f missing. That means the true function's missing. You're gonna have these data points hopefully coming from the true function. Right! Okay. So, that's a goal. Let's go back and talk about how we're gonna kind of solve it. Let's come back. A few moving pieces here. So, maybe let's define the loss function and then explain how that's going to be used. So what is this mean square? Well, recall that the role of the loss function is to tell us how far away our predictions are from the observed truth. I keep saying truth here, but that probably isn't perfect truth, but it's an approximation, right? So, let's put in. Let's consider a point. So, first look at this point here. Let's call that maybe point A. And what's our prediction for A. Let's call this A hat for our prediction for at that point. I mean there's really no special notation here. We can call it point A. We can call it .0 or .1 or something. And then here, let's think about a different situation. So here we've got a point B and B's below the line. A's above so we can give it that interpretation. And then B here. So that's our predication for B. So you can see that over here in this region of x. So if x is let's say someone's age and y is how long they're gonna spend on our website. Well we can see for this sort of region in age, maybe that's 18 to 19. Well, in this region for our age we tend to, what do we tend to do in terms of the time? We tend to under predict. We're going too low. All right, what about over here? Well let's say this is 10 to 12 years old. In this region we're going a bit, well, the truth is lower than our predictions. This is the region in the time and we're going a bit too high. So here we're over predicting. Here in this region we're under predicting. Okay, fine. Well now we just need some notion to give us a sense of what's going on there. So that's loss. So, error. What's error gonna be? Well error here, we're just gonna define that fall in your regression as the distance between literally the distance between the prediction and the observation. That's just gonna be one number minus another number. So that's just gonna be here. That's just gonna be the the prediction which is thing of the how on the red line minus the observation in blue. That's that. That's the error for a particular point. Error for a particular point. Let's say A hat here is kind of website let's say it's, we're saying five minutes. The truth that we have seen for this person is 10 minutes. Say probably put that in blue. And maybe make that a little more definite. And then the error here we could call that e if we wanted to. This thing here error for this point A, error point A. That's gonna be five minutes. Or is it minus five minutes? What if it's the, you can define this any way. You can see why that is. But if we're doing prediction minus observation well that would be prediction minus observations. Prediction's five, observation's 10. That would be minus five, right? Okay. Let's have a look at the other side B. Again, the same story over here. Let's say this is now, B is now in terms of the height of the thing we thought was five, let's call this three minutes. The truth here is two minutes. Let's say that's an error of, error from point B here. That's gonna be a one minute error.

### Lectures

About the Author Michael Burgess
Principal Technologist for Machine Learning
Students
912
Courses
9

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.