Multidimensional Linear Regression - Part 3
Maths for Machine Learning
The course is part of these learning paths
To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part two of the module on maths for machine learning. It focuses on how to use linear regression in multiple dimensions, interpret data structures from the geometrical perspective of linear regression, and discuss how you can use vector subtraction. We’ll finish the course off by discussing how you can use visualized vectors to solve problems in machine learning, and how you can use matrices and multidimensional linear regression.
Part one of this module can be found here and provides an intro to the mathematics of machine learning, and then explores common functions and useful algebra for machine learning, the quadratic model, logarithms and exponents, linear regression, calculus, and notation.
If you have any feedback relating to this course, please contact us at email@example.com.
Right, we're gonna have a more detailed section on this. I just wanna complete the picture of it and give you a visual intuition of Multidimensional Linear Regression and how calculus fits into this. So this is the update rule we'll use, but what are we actually doing when we solve the problem? So visually speaking, what are we doing? So, recall that we've got three dimensions here and we're looking at finding now a, let's, how do we wanna do this? A plane in three dimensions that'll be going that way. And, now, when we vary B and W, when we vary these things, you can think of these as the little dials that we can twist around. So, this is W1, this is W2. W1 is the steepness of that line, and W2 is so we can vary the up and down and make it, well, you wouldn't vary it up and down that way, we would vary it, we would rather go this way. Or we could make it higher or we can make it lower. And this one, we could make that steepness, we could rotate it this way or rotate it that way by moving W. And what's B? Well, B is this point here. So we can either, so that's B. And we can move that up and down, that moves the whole plane up and down. So we can rotate it up along through X1, sweep it through X1, and we can rotate it through X2. So we've got this. So, that's, if this is the solution, we can move it this way, we can move it that way, and we can shift it up and shift it down. So, this way, that way, up and down. Right. That's intuition for these parameters. That's some intuition for what these parameters are doing. Right, how is this process helping us? Well, when we look at the loss, so if we have some points here, see there are some points here, there are some points on the line, and some points below the line and likewise here and here and here. When we look at the loss, with respect to any given parameter, so let's look at with respect to the bias. So if I put lots of points above here in both cases, we can see that by increasing the bias, my solution would be better. It'd go through more of the points. So, this gap here, so if I just maybe put the gap in some very light color, blue, this sort of region here is the region where my plane is too low. My plane is too low and so I'm not getting, I'm having large errors. Right, so this is quite a high error. So if I look at my loss, so then I plot "loss versus W," or B, let's just do B for simplicity. "Loss versus B." Well, as I move B, if I start here at random, let's say I place B at random. And it happens to start at whatever number that would be, say one or something, doesn't really matter. Now as I move B up, so as I increase B from my starting position, I increase it, my error is going to drop. So if this is my starting position, so let's just call this, it's just some random position I've started with. Let's say I start at two, then as I increase B, so as I go up, my loss decreases. So it's going to be doing something like, if my loss is really high here, it'd be going down like that, you see? Now, if I move my plane too far, suppose I move my plane so that it's now up here. That's, I don't know why I'm gonna do that. I just, hang on. So, I'mma draw in four. But suppose I move it so that it's up the top there. I've gone too far, right? So now I have an error here. Now I have this error here to consider. So if I move it too far, suppose I move it, let me just draw this a bit bigger. So suppose I move it to three and I go over, I go to four. Let's see on the diagram. This is B being two and this is four. Then at four, I've come too far and I've started going back up in my error again. And so at this point, let's suppose the ideal point here is three. This is point at minimum loss, right? So the shape of the loss is like that. And there's still some loss. Even at a perfect, well, not perfect, but even not at best bias, best intercept term, they'll still be some loss because it won't go through all the points. So it's gonna be a perfect model, still an estimation. So there'll be this loss here. We can call this minimum loss. What our system here is doing, take this step here, for example. What that's doing is it's saying, "okay, start yourself at random." So maybe you just wanna put that, DL by DB, since how we're updating our bias. What is this doing? Well, we'll start at random. So let's just start where we started at two. See, this is our random starting point. And then our next step is to take away some percentage, this is just going to be EG 1%, some percentage of the gradient. The derivative, slope, and gradient here are all basically the same term. And what's the gradient at this point? What's the slope, what's the derivative? Well, it's the tangent in the direction of increase. So, DL by DB is the tangent up or it's the slope up at point BR. B random. Here, in the diagram, we can say at two right, because we know we're at two. Two, at two. So it's a slope up at two, so that's the derivative, so it's pointing up. It's some number, so here let's say the gradient here, or the slope of that line, let's say the slope of that line is 1.5. Then, if we add 1.5 to B, we get worse. Because we get higher. We wanna go lower. So what we're gonna do is take away 1.5. We're gonna do the next one here, if you move to some percentage of 1.5, let's say 10%, no 10% is a bit high. Let's say 1% of 1.5. So if it's 1.5, then times 0.01, gives us a very small number. If it was 15, it would be 1-5, so 0.0015. So it's gonna be a tiny, tiny amount. What it's going to do is it'll shift us in this direction. So as the gradient is positive, we keep moving down because we're taking away the gradient. Suppose we overshoot and we end up on this side. Suppose we overshoot and we end up on that side. Now the gradient is in this direction. Right, the gradient's in that direction. So the gradient's in the direction of increasing B. So what we do is we decrease B. So this gradient here is saying, "go this way." This one is saying, "go that way." So since we're taking it away, we're going the opposite direction and so we go back here and hopefully we land at this point in B, or we land at three. And why would we land there? Well if we zoom into this point, so there's a massive zoom in, and we land exactly, or close to three. At this point, the gradient is flat, the tangent is flat, the slope is flat. At that point there, we just try to draw it straight. Getting worse, sorry. There we are. So at this special point, at the minimum point, the line is flat and so there's no change, we don't make any further changes to B because this term here, because this slope is 0. So this term becomes 0, so now B equals B. We're not making anymore changes to B. So at that point here, the derivative or the slope is 0. Okay, so the slope of a flat line is 0. Okay. So that's a visual intuition to what we're doing where we're moving this plane around here. Moving this plane up and moving it down. So what we do is we start at random and then we move up, up, up, up, up, up, up, get to a point where we've gone too far, so this is where we're over here, in the loss. Then we come back down again and then we sort of settle where the minimum loss is. Because now at this point, we're not making any further updates to the bias. So we settled here. So it has this sense of move, move, move, move, oh, overshoot, okay, so move back, move back, okay, settle here. Make no further updates to the bias ten. So this would be increasing loss, that would be increasing loss, so we move back, so we move back, and we end up at this good position of the plane. Right, so there are some issues with this approach. Well, I mean no issues, but there are some subtleties of this approach, which we'll discuss toward the end of the course in the optimization section, but this is a picture now of how all these things are fitting together. So, we've got calculus for dealing with losses and giving us a sense of hot and cold. How good is our model? Well, our model is cold, it's wrong, keep going. Okay, now we've gone too hot, okay, go back, too cold. So we want this model to sort of settle in this nice midpoint and the loss is a kind of, it's a pushing and pulling force. And we use calculus to find how much to push and pull. And then there's this linear algebra, which gives us just the terminology and ideas that we need for this multidimensional approach. So, we're gonna also look at some of these ideas in Python to give you that computational, practical impact of what we're doing. But, many of the key takeaways here are actually conceptual, they're not mathematical. I think that's the key point we should finish on. Which is, in machine learning, in the general practice of machine learning, the general practice of it, not the academic practice, and not maybe the most highly professional expert, but the general practice. The expectation isn't that you will be doing what I've been doing here by hand, which is you have your derivatives and anything of that kind. It's all sort of done for you. That's done by the computer. So the point here is that you have enough sense of the key concepts, concepts of vectors and matrices, of weighted sums, of gradients and slopes, and how to calculate them a little and these algorithms that are going to help you solve problems. So it's concepts, not having it down sufficiently well that you can reproduce the actual techniques themselves though they'll all solve problems. Now it is important, however, when getting a concept down to give yourself enough actual detail behind the concept to see it work. So, it is quite important to have some practical application. So do some maths, basically, early on in your learning journey so that you're getting an intuition to what the concept is, but actually in the day to day practice of machine learning, it's not as if people are doing mathematics by hand and such. It's all done by the machine. This is the conceptual point and the more familiarity, the more intuition you have for these concepts, the better able you will be to critique your machine learning solution or also just understand what you've done. The practice of machine learning is fundamentally a mathematical practice. What you are doing is solving a problem using mathematics and what you need that for is the vocabulary and ideas in mathematics, but the solution is left to the machine. So you can kind of get away with abbreviating that, skipping over the detail there, and just having the vocabulary and concepts to give you some general idea of what's going on.
Linear Regression in Multiple Dimensions - Interpreting Data Structures from the Geometrical Perspective of Linear Algebra - Vector Subtraction - Using Visualized Vectors to Solve Problems in Machine Learning - Matrices - Multidimensional Linear Regression Part 1 - Multidimensional Linear Regression Part 2
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.