- Home
- Training Library
- Programming
- Courses
- Module 2 - Maths for machine learning

# Multidimensional linear regression - Part 2

## Contents

###### Practical Machine Learning

## The course is part of this learning path

**Difficulty**Beginner

**Duration**3h 17m

**Students**43

### Description

To design effective machine learning, you’ll need a firm grasp on the maths that support it. In this module, we’ll introduce you to the mathematics of machine learning, before jumping into common functions and useful algebra, the quadratic model, and logarithms and exponents. After this, we’ll move onto linear regression, calculus, and notation, including how to provide a general analysis using notation. We’ll also cover how to use linear regression in multiple dimensions, interpret data structures from the geometrical perspective of linear regression and discuss how you can use vector subtraction. We’ll finish the module off by discussing how you can use visualised vectors to solve problems in machine learning, and how you can use matrices and multidimensional linear regression.

### Transcript

- So let's visualize it. So what do I mean by that visualizing linear regression in 2d, or 3d, 4d, how many dimensions that you have. So in one dimension, remember, we would have just a vertical axis and a horizontal axis, here we are, and we will have this historical data set. And to solve the linear regression problem to obtain the model would just be to draw a line, you know, good lines through this. And what does good mean? Good means minimizing the loss, we'll come on to that in just a second. So this is the one dimensional case 'cause we have one x, let's talk about the two dimensional case, two dimensional linear regression, where we have two exits, right, x one x two a vector of x's. So let's have a look at that. So here, we would have we have the same axis, horizontal, vertical, but actually we would also have another axis in the plane, which is x two. So what like that, so this is gonna be, I'm gonna draw like that, maybe I'll draw it. but we'll see there's x one here is x two. And here's y. So these are the axis in three. Well, in two dimension x three in total y is the vertical axis, now we have two axis, which are perpendicular to y. So 90 degrees here. Let's draw the solution on the model on and then we can see where the points might be as well. So if we draw the model on, and you do that, let's see if we can do a plane. So the idea here is that we have a plane and the plane is gonna look like that hidden three dimensions. So it's going across. You know it's going across like that you see now, so if we look at, if we try to interpret this plain and what are we seeing here? Well, you know, this edge that edge that green line that green line is the relationship between x one and y for a particular value of x two, so, if I look at x two and I drag x two down to zero, then I get that green line. So if I drag x two down to zero, so I completely ignore x two, then this is the relationship that I observed between x one and y. And what is and likewise this edge here is relationship I observe the most extreme value of x two there you see, and now every line parallel along this plane is a relationship between x one and y, for differing values of x two, as you can see there's a relationship there relationship there So this is like having many regression lines. So, you've got one line here we call that line maybe a then you got b line, and c and d actually got, but basically as many lines as there are entries in x two, so if x two is a discrete number, say it's a film rating between zero and five, you got a rating of one, two, three, four, five, whatever, maybe make it out of 10 six, seven, eight, nine, 10. Okay, good, then, you know, for each each of these is like a cut through this plane. And for each rating, let's say this is age. For each rating, we would have a particular relationship between the age of the customer and how they, I don't know, and how long they spent how much they spent in the cinema, say right now. If we go back to the example we had above, which is grade, hours and GPA, so okay, so this could be, sorry this could be our hours. And this could be our GPA. And this would be our grade. So let me just put those on as well as other examples. If you have grade x one, could be GPA, and here we could have hours spent studying. Now of course, in the case of hours, the problem is that it's a real number. It's not a discrete, it's not real options one to 10. It's a real number. And so actually, there's like an infinite number of lines here. And, so if I have my model in red, my model in red here and it has two possible inputs. It's a vector, right? So you know, you could either write it that way, or you could write it just out in full like this. And what that means is that for every option in my x two, let's say I put in here eight point hours, then there's a whole line here. And for every area I go for different options, you know, nine point nine hours, may be another line it'd be different shape right? And you can kinda see that just from the four formulas. If I say, it's going to be a x one plus, let's say, b x two plus c, then you can see, this formula here, you're basically setting x two be a point eight point eight. So that's going to be a x one plus b times eight point eight plus c. And the second case where a x one plus b times nine point nine, plus c, these are different lines, right? So you've got different value, you know, this part of the puzzle is different in each case, so we're always gonna have different lines. Right, okay, so that's the that's the sort of visual idea. So in three dimensions, we have this plane that we're solving that's a solution. I mean, points, of course, historical data sets, difficult to draw in three dimensions, but they will be little stars, some of the points would be on the solution. So those will be points where we actually solve it perfectly without error. But those, those points have no error on say these big numbers, alright, but maybe some would be above the plane, some would be below the plane, some be above be below, above, below, and then you can see that these would be the errors. These would be the errors that would that would have some loss associated with these points. Right, okay, so that's the, right? So, that's the general setup. Now, I think it's appropriate at this point to go back and make the slight adjustment I mentioned earlier, to this picture. So if we compare, this compare this formula here with the one I gave below, we will see a peculiarity there's a missing term. So what do I mean by that? So if we just look at the formulas, if I say f of, well, weights, it put the weights in as w it will be x arrow for a vector weight vector. And if we just say that it's weights dot x, well, there's an issue there, because what that means is every entry in w multiplied by every entry in x. So if we have two x's, that would just be w one x one plus w two x two. And that will be that. Here's the problem. If we compare that to a x one plus b x two plus c, where is our c, where's the intercept? We have a slope. For the x one connection, we have a slope or a weight for next to connection, but we have no intercept. So how can we solve that problem or stick to this notation and also have a c? Well, there's two things we can do, guys just rephrase this model. So that is w, x plus b and in this case, we call this weight and this is bias. But in you know, for linear regression, it would just be slope, and intercept, meaning the same thing basically, so or another little trick which is sometimes done is to imagine that there is, in fact, an x here we call zero, say, and then we just set that to be one in the initial setup, so that when we do w times x, you know, this little part here is always one. And so the third entry or the first and whatever entry it is that corresponds to the one, that entry is just an additive term. And that's the intercept. Now, I think we'll, we'll come back to this trick here. When we talk about neural networks, 'cause that's often done for the formulas in there. Maybe for now, possibly, what we'll do is we'll actually say, okay, now there's some weights such as a bias 'cause that might be a simpler way, of keeping this picture. Right, so that would be a bias and the bias would just be one number. So if x if the two x's then we need two weights. But we always just need one bias, which is just, this point that the solution is in with zero inputs. So that would just be one real number. This is sort of our, let's maybe keep the biases and weights together and the features and targets together. So it's a shame that that hasn't worked. But maybe what I'll do is I'll just put, I'll just put that back. And, we highlight the weight and the biases that model parameters. And the setup is the x and the y, the x and y is kind of our problem. And the w and the b are coming from our choice of solution in linear regression, right, okay, so we've got so we've solved that problem of of you know, not having it an intercept terminal model, so we just added one in, okay, good. Now let's go through the solution, how do we actually get how do we solve this problem of finding the line? Well, if you remember, we define the loss. And then, we use that as a guide as to what parameter to select. Okay, so let's talk about optimizing the loss, optimizing the loss. So the model now is w dot x plus b, the loss is in, we know we vary our weights and biases, or our slopes and intercepts. So my screen has a small issue that doesn't matter. weights and biases, and no, we're varying both of them. So we're varying the weight and varying the bias, but we're holding our data set fixed when we do that, and the formula there is where we compute the prediction using this formula here, that gives us our predictions, that's weight times x plus a bias, then we minus away the actual true value, and we square it, that gives us our loss of every point. And then what do we do? Well, we take the derivative of that loss. So d loss d with respect to the weight w. And with and that's how we update w, so we update w using the derivative. So they say the new value of w is gonna be the old value of w minus some percentage of that derivative. And likewise, in the case of the bias, we update the bias, according to some percentage, of how much the last changes as we change bias.

# About the Author

**Students**176

**Courses**6

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.