To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part one of the module on maths for machine learning. It will introduce you to the mathematics of machine learning, before jumping into common functions and useful algebra, the quadratic model, and logarithms and exponents. After this, we’ll move onto linear regression, calculus, and notation, including how to provide a general analysis using notation.
Part two of this module can be found here and covers linear regression in multiple dimensions, interpreting data structures from the geometrical perspective of linear regression, vector subtraction, visualized vectors, matrices, and multidimensional linear regression.
If you have any feedback relating to this course, please contact us at support@cloudacademy.com.
- Right so let's have a look at this Excel file then, and it gives us, it might give us a bit of a a visual intuition for these formula. So let's talk about the setup. Here we've got, you know, age and time, but you know in our example we had, number of hours on an exam right? So there's the hours, and then that was going to be the grade. Now here the, the grades only go up to nine and the hours go to 17, that's not that helpful, so maybe this should be, maybe we say the grade goes out of ten. So on this, this particular problem to make the, to make them correspond. So we've got zero to ten grade. At 90%, 80% whatever. And here's some data points. Now so these points in blue are these columns in blue here, so we've got x and y. On this notebook, or little workbook or something, we have a little index column, just for referring to the points. But that index isn't part of the problem, it's just there for your own numbers basically. So we've got what 30 and three on the graph. Where's 30 and three? 30 and three will be one of these points somewhere you know. 30 and three is this point here. So you can see each of these rows appears as a blue circle on this canvas. These orange numbers, these are the predictions that come out of our use of the function, talk about that in just a second. But they're visualized here as this line, so all the predictions are on a line. You know all the predictions are on a line. And here is the computed loss for every prediction. So that's just the prediction minus the observation squared. And you can see that from the formula here, we'll go through the formula in just a second. And okay so how are these predictions being computed? So we've got this function here, this linear function ax plus b applying to x giving us a guess for y, y hat, then we compute the loss. And the total loss here, well it's not quite defined as the total loss, it's defined as the square root of the total loss. And you know the reason for square rooting a total like that is because each of these entries is squared, to get rid of the sign, which kind of makes it hard to interpret what these numbers mean, so if you've got a loss of two, you know what does that mean in terms of a grade? Overall, dunno what that means. But if you square root that. So for example if I have a number I can easily square root, let's say this is 25, square root of that is five, what that means is, and you can see here the difference is kinda five, it means that I'm out by five. I'm out by a grade point of five. So you are on five marks out or something like that. So the reason for square rooting a loss like this, is to bring it back into interpretable units. So the units of the squares are grades squared, its hard to understand what a graded square is, but the square root of that has the same units as the problem. Okay, so this is the setup. If we look back at the whiteboard, if we look at the whiteboard, just remind you, the goal here is to minimize the total loss. Well, let me just make sure we're talking about the word total there, that should have a sum sign on, so let me just put that on. I'm gonna minimize this formula here. For all the, for the whole column of x, let me just draw columns in blue. For the whole column of x and the whole column of y, minimize that formula. Okay so let's go back to the Excel file. Whole column in x, whole column in y, minimize this. And how do I do that? Well I look at the formula again, look at this objective a b, a b are the things they can change, let's do that. So let's make a and b just some big numbers, to get the total loss really, really big. And the approach here then is just to keep changing them. So if I go four, okay it goes down. Three, it goes zero. Okay it goes down very quickly, so maybe let's try minus one, keep going. Okay, it's gone back up again, so we'll just go back to zero. Okay now let's try b, now let's go for you know five. Okay gonna give it a go with two, didn't make a lot of change. Three, four, hard to see what to do here isn't it? Five made it go up. Right so that seems, you know, we haven't maybe got to the minimum. But we can stop here and say we've actually, you know, found something interesting. And you can see here that what we've found out is that the prediction, could be apparently, a reasonable prediction, it seems from the state of it. As it's predict the same number over and over again. And, you know, I bet if we look at the mean of this column. So if we look at the mean, so we just do average here. Hoping, it probably comes close to four, yep, it's average. Yeah so there you go. So maybe if we change this to three point four one, we can see why. Yeah so it went down even further. So you can see why this has turned out to be a, not a terrible prediction. And that's because it's sometimes, you know, that if you predict just the mean, then what that means is that, you know, half the time you're wrong above, half the time you're wrong below. But that if, you know, if you only have predicted one number that would be a good number to predict. If you could only make one number as a prediction. So the mean is a pretty good number to try and beat. Maybe we can beat it, maybe we could even, you know, it's not See that makes it go up. What about minus nought point nought one. See, yeah, you can do better than the mean you see. So you can get better. So probably by tuning a a little bit more, you can make the line go a little better still. Too far I'll make one final adjustment, and then you can sort of see what I'm talking about in terms of a good line. So okay, so what does this process showed us? It's shown us what the goal is right? The goal is to tune these, to tune those numbers a and b. Until the total loss is minimized, you stop when you hit the minimum. So you keep moving, you keep a record of all losses you've had and all the a's and b's you've chosen. But you can just visualize it by you know, doing it in memory and keep tweaking it until it gets lower and lower. Now there's two kinds of parameter in this notebook isn't it. There's this data set here. Now this data set kinda goes into our predictions. It goes into our predictions and you know when we're predicting something. So when we're making predictions for this point here, so this point here line. When we're making predictions for that point, the data set is a variable. So you know, you choose different x's and you make your different predictions. But when we're tuning the loss, actually we treat a and b as the variable. So when we're making the prediction, we're changing which x goes in. But when we're computing the loss, we're changing which a and b go in. An intuitive way of thinking about that is that when you make the predictions, you're making them for each point. But when you're computing the loss, you're changing the predictions, right. So the steps are, there's your points, there's your predictions, first step. And now change your predictions. So you can see the line wander around. So if I move these you can see that what I'm doing here, is I'm changing my predictions, my prediction line's moving. So I'm moving a and b. But when I compute my predictions. Well I leave these fixed you know, I don't want to change, you can't change these when you're computing predictions, cause these are gonna give you a prediction. So you hold those fixed. And then you go down this, these rows here, and compute a different prediction for every entry. So I wanna have us think about those two notions very carefully and those two kinds of activity. Cause that's gonna give us a way of understanding this formula. So our prediction function is one where we try out different x's but hold a and b fixed. But the loss, when we're computing loss, actually we try out different a's and b's, what we do is hold x and y fixed. So here we hold the data fixed and we try different lines. And here when we just, when we're just computing a line, we're just computing a prediction. We hold the, this is sort of like line is fixed. And the point that we're predicting for changes. And that's just a notational thing, that's just something okay here's how we're gonna read, but the notation really exposes something about the method right and the method is predict and then loss and then adjust loss and then predict and the cycle of adjusting the line, making the predictions computing the loss and so on. Right so we're gonna, we're gonna leave it there, and come back to this and go through the detail of how actually we make changes to the loss in an optimal way. In other words what's the algorithm for actually doing minimization. But for this section what I wanted you to get as down was just the setup of the problem, so we understand all the notation that's going into it. All right.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.