The course is part of these learning paths
To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part one of the module on maths for machine learning. It will introduce you to the mathematics of machine learning, before jumping into common functions and useful algebra, the quadratic model, and logarithms and exponents. After this, we’ll move onto linear regression, calculus, and notation, including how to provide a general analysis using notation.
Part two of this module can be found here and covers linear regression in multiple dimensions, interpreting data structures from the geometrical perspective of linear regression, vector subtraction, visualized vectors, matrices, and multidimensional linear regression.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Okay, so let's now talk about the quadratic model. Quad Q U A D ratic model. That's where we have an f of x, which is going to be x squared. Well, now let's have let's have an a x squared plus b for now, we can keep introducing parameters on this one, and still remain a quadratic model. And I'll show you what I mean by that. But let's start with two parameters we can we can keep going on that one. So, for this diagram, I think what we'll do is we'll draw a very long axis here, and we'll put the zero in the middle of it. I'll show you why we'll do that now. In a second. So so to draw this one, let's, let's draw it through blind almost, let's try to figure out what the shape is just using the formula, rather than kind of knowing it already. So if we didn't know the shape of a function was how could we start maybe trying to do it by hand or figure out what the shape was? Well, it's always good idea to put in a negative number. Zero, zero is a very, very important number and a positive number. So it's always good to try, a negative number, zero, a positive number. And then, if that's still isn't giving you a nice simple shape. If the shape is more complicated than that, there's some other techniques you can use, such as trying very large negative numbers, very large positive numbers and making making certain inferences that way. But that's a little more complicated than what we're doing here. Let's just try three points. So what do we get if we put in minus two? Well, minus two squared is four, and a times four is four a. So we get four a plus b, or if you put zero in. Well, if we put zero in the x squared, that goes to zero, so we get, zero times a which is zero, we get b and f of two, well, you get the same thing as above, which is four a plus b, because you get x squared is four. Okay, so what do we know then, we know at zero, at zero, we start the line b same height b, I guess on the input zero, our height is b. And we know that at minus two and plus two, so minus two here, and plus two over here, the height of the function is the same. So it's going to be some ways above b. So it's a b let's say a if we know if a was one that will be four units above B. So maybe its one, two, three, four let's think of that as being four a plus b. Of course, you know if we choose different inches and different b will look kind of the same you just scale the graph a bit. Let's zoom in and zoom out. So the height of the curve is the same. So it's here and here. It turns out that actually this, this function has a U shape. So let me just try and draw it. So I go through all the points if I can. There we go. That's about right. Right, so this is a quadratic model. And let's have a look at what the parameters so far are doing. Well, we've got b, b is kind of obvious one, maybe hopefully, visually looking at it. It's where the intercept is, and the intercept here, see what this technical term here intercept, is where the where the graph hits the y axis. So with a by definition, what happens when x is given the input zero? What happens to the function? So when so that's here, b, and that's when you know, we put zero in for the input. Okay, what's the role of the a? That's a bit trickier to see, isn't it? Well, let's, let's, let's imagine a were twice as much. Well, if a were twice as much as, as so if we put in one for a, this will be four plus b, if we put in two for a, we would get sort of eight plus b. So if we think of this as eight plus b, and this, this point here as being four plus b, then what happens? Well, then the two points at two minus two, they go here. So this or that sort of thing here becomes kind of narrower. Alright, so you can see sort of like that. And if we keep going with that one, we would see actually, it gets narrower and narrower, and you can sort of see that. What if I, what if I reduce a? So, if I make a, a half, that would give us two plus b. Okay, so give us a maybe make it down here and so my points would be really shallow. And so the curve would be really shallow like that you see. So what's the role of a, the role of a is actually something to do with the scale of the thing. So how, how may be it has, how wide it is, or the shape the global shape. So b is shifting up and shifting down. And a is scaling up by the side. So it's kind of visually or geometrically, or through all these parameters are. And you think, well, why, why does that matter? Well, if our predictions, that we're getting out of here, come from this function, and if you know, so what we're doing here is we're taking an algorithm, which will be script a, and that's going to give us in this case, some numbers for a and b. So let's say it gives us, you know three x squared plus one. Well by changing a and b, then we're gonna get different predictions of that. So just, if I want to predict someone's age for a, you know, an input of two so remember this is how much they will like a film based on the length of the film, two hour film or something like that. So if we look up from two, so just draw a line through two here. Now you're gonna get different predictions, right. Depending on where we where we move, a or we move b. Now, of course, choosing this example, I'll say offhand that this couldn't be. It couldn't be film length, because it goes negative. That tells you something about the use of a quadratic model. Quadratic model is going to be something where the input has this interesting behavior, where you, you can go between negative and positive values, possibly. Maybe you can go negative, maybe you can get positive. In any case, there's this point like a zero point of some kind where we can shift, where values above it have the same response, same output, same prediction as values below it. Okay, so what sort of things behave that way? Well, let me give you a little sort of a preview here. Well, when we, one thing behaves that way is sometimes how wrong we are about things. So how wrong we are about things, right? So if I'm, so let's say the right value is here, let's say, my goal, or my aim, or my target is here, that's what I'm aiming for. Well, if I go, if I throw a dart or something, I hit there. And if I throw a dart and I hit there. Well, this distance, so this error, this one is an error below. And this one's an error above. And it kind of doesn't matter if it's above or below, it's out by the same. Let's just think about that. So, if I have this target that I want to hit, that's a that's my Bullseye, zero, right? And I, I'm out by plus two I'm out by minus two could be could be anything, well, then maybe, my error will be the same in both. So let's say, let's say, my error here, let's say I guess score. Let's say this is a dartboard of some kind, and or whatever the game is, if I hit the middle, I get a score of, say, get a perfect score score of zero. And if I, if I'm, if I'm, let's say I'm two, for every two I am away, my score gets worse by 10 points. So what I should hope is that if I put a 10 on here, then I get 10 points and 10 points, that's 10 points of bad in this case, and then so if I get another, if I go to minus four here and plus four here, then I get another 10 points, sends to 20 points, there'll be a point here and the point there, and so the shape would be that shape. Right. So, sometimes intuitively how wrong we are about something could have this shape now going, going, going too far up is just as bad as going too far below. Okay. So it's an example of a quadratic thing. Now, let me just make a few final remarks about the quadratic form here, because we haven't completely sort of given the full picture. In the full picture, We can shift. We can shift this, where the zero is I mean, that makes some sense to me. Maybe we didn't want the bullseye to have zero points. Maybe we want you to have 100 points. If I wanted this to be 100, 100 points when, when I hit it, well, how do I do that? Well, the form here is f of x. And it's going to be x minus 100 squared. So let's just say, let's just go for that form for right now . You could put some a's and b's around it. Okay. So what does this do computationally? Well, let's put in here 100 and then we can see. So if I put in 100 into the input, we can see that 100 minus 100 is zero. And so we get zero is the output. So if I put 100 in, I get, like, you know it becomes vertical, that's zero there. So no height at all. So that's exactly what we want. Okay, so that's one modification we can make. So we could so it's still U shape still quadratic. But we can shift the zero, shift where it gives you zero. What else can we do? Actually, we can add a digit we have lots of different parameters, we can add at least one more parameter anyway. So maybe we can say, a x squared plus b x plus c. So we can have three parameters a, b, and C. And so a and b play a kind of scaling role and we can have a look at this in some code just to see what kind of roles these play. And I think that's probably the better area to look at. But it's just, I want to say here that the general form, this is the general form. And the other ones are sort of simplified variations where we just set b to zero or something like that. And in this, this general form gives you three parameters to play with, and that's a more powerful model, or a more complex model, in the sense that if I wanted to have my predictions, more tuned so that, with a with a straight line, I, I only have two numbers play around with so I can only move a up and down, up and down or be up and down and a side to side, and we swing it around and move it up and down. Well doesn't give you a whole lot to try and contort your line into an interesting shape. The more parameters you have, the more structure your line can have. And so if you have a really very complicated problem, let's say that, that you know this is your data set. Maybe what you want to do is have a line that looks like that. Well this line here has as many parameters. And so the more parameters you add, the more complicated lines you draw.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.