Providing a general analysis using general mathematical notation - Part 1
Start course
1h 45m

To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part one of the module on maths for machine learning. It will introduce you to the mathematics of machine learning, before jumping into common functions and useful algebra, the quadratic model, and logarithms and exponents. After this, we’ll move onto linear regression, calculus, and notation, including how to provide a general analysis using notation.

Part two of this module can be found here and covers linear regression in multiple dimensions, interpreting data structures from the geometrical perspective of linear regression, vector subtraction, visualized vectors, matrices, and multidimensional linear regression.

If you have any feedback relating to this course, please contact us at


So let's try and provide a general analysis of that rate of change using the general mathematical notation. So once more, we just look at the quadratic function there, X squared. And now we're going to use just general symbols. So we're gonna say that the lower point, let's call that lower point X naught. And the other point we're going to consider to ask the question of how fast the function is changing is, you know, we're going to do X naught. So that's the one that we've already considered. But we're going to add a little piece to it. So you know if X naught here were two and this were three then the piece we would be adding here would be one. Okay. And the response of the function to these inputs, let's just show you the responses. Obviously, it's this vertical thing here. So maybe we just draw that and take that. And that's going to be F of X plus delta, and F of X naught. That's naught plus delta. So think of X as just being this variable that can range between any value on this axis. And then this is the response at two particular values. X naught, EG two and X naught plus some delta. Some small amount. 83. Now, so what's the general formula for this rate of change, then? Well, general formula is, well, it's going to be the height of the functional the response of the function to its input, at the higher place, minus the response of the lower place divided by the size of the step, which is you know, it's actually just Delta. But you can if you want to write it out in four to be X plus delta X naught plus delta minus X. So you can see here what the sort of the beginning of this formula, you have the, the higher place minus lower place here, because it just simplifies. Get rid of that. And you've got F at the higher, you know, along up from X, and F at just the exit interest divided by the step size. Now we can leave that formula there, and say, that's just the rate of change. Now, and actually, the name of this formula here is called a formula for a finitee difference. finitee difference. It's quite an important formula. Let's just describe what these terms mean. So finite, a finite difference means that actually, you know, we have a step here. So we're moving from a lower number to higher number, and we're moving by some amount, one. And the amount we're moving is just, it's a whole, ordinary sort of number, a finite number. We're going to do in a second is consider what happens as we try and move that step closer and closer to zero. Whenever we're not going to set it to, well. We're going to kind of try and set it to zero in a sense, and see what happens if we can say, well, you know, as I move the second point closer and closer and closer, is there anything interesting that happens to the rate of change? I'm gonna ask that question. I'm going to come up with a slightly different analysis, slightly different analysis. And in the case where the difference is zero, or approaching zero is the technical word for it. The difference won't be finite it will be this whole step. So what where's the difference here? Well, the difference is just in the step size, right? So it's a formula for finite difference, and finite difference formula. And it's perfectly good formula for calculating a rate of change. Perfectly good formula. However, mathematicians are not always that happy with it, because it's a little bit unwieldy. And we can do a bit better in general. Ya know in general, as I was saying there, you know, we can ask, in a sense a simpler question. The direction you come at it is a little more complicated, but the answer is a little more straightforward than that formula. It's a simple formula. And so the thing here is to try and think about how we can get to a simpler point here. Is just to think about what's happening as, you know, if we wanted to know what the rate of change of this function, closer and closer to this point is do we see a rate of change around this point, you know. What sort of rate of change we see around that point, in other words. So if I zoom in, let me just sort of try and know, imagine we zoom in to this point here, this region. You know, so we're getting closer and closer. There's the point. As we move in this second, as we're decreasing the step size, more and more towards zero, what we would see, of course, is the height, if you like. They're going to draw the triangle here. So the triangle. There's that triangle we will draw. As we're moving the point closer and closer together, triangle, the hypotenuse of the triangle, this edge of the triangle, gets closer and closer, just to being a line which is not really a slope between two points. You know, this point and this point. It's actually a line, what we say is tangent to this point. It actually gets closer and closer to being the flat line, which sort of meets the point. If you think of this as being a plate or a bowl or something. As you move this second point closer and closer just to the point where it's essentially overlapping the original. As it sort of overlaps the original, there's, you know, the slope between the points gets perpendicular to the point. So it's, you know, so the point here is the point. And actually, if the curves coming through it, as the second point sort of overlaps the first, the line gets closer and closer to being a tangent. Like that. At a right angle to it. Now, okay. So what's the significance of this? Well, the significance of this is that actually we can, because this phenomenon exists, because there is in a sense, some meaning to the slope at a point, which we'll pause on just for a second. But because of some meaning to a slope at a point, single point, what we're actually doing the mathematics is just ignore the second point. So we're just gonna get rid of kind of, we're just gonna sort of figure out a way to see if we can actually just sort of dispense with this second part here. Because, you know, if we are on the first party, if you know this higher point maybe try and get rid of one or the one other. But we want to see if there's any meaning to saying actually, is there a slope, is there a rate, is there a rate at a single point? This is a bizarre idea. And it's, the way I'm putting it is a way it's often said, that there's a rate of change at a point, but that's a very peculiar notion. Because how can you know, how can I be increasing in my height? How can I be getting steeper and steeper and steeper? How is there any sense to a kind of a rate, a steepness, a pace of change at a point? A point is almost by definition, not the kind of thing you can have a steepness on. You can't have a steepness on a very single point. So you need two points in some kind of difference of level to have a difference of level. How can there be a difference of level at a point? Well, technically speaking, there isn't. And this is the kind of thing that people get it wrong sometimes about calculus. Especially, we'll talk about here, differentiation. And that's actually, there isn't a rate of change at a single point. That's not the case. What there is, is that there's a way of taking this formula, and speaking about what happens as you move the second point down closer and closer and closer together. And what you find is that there's a kind of the kind of leftover, there's a kind of slope, a kind of minimum slope, such that whenever, whenever you take a second point, you know, whatever second point I take, whether it'd be 2.1, or 2.2, or two, you know, or whatever, 40 or 2.0001, whatever I choose for my second point, it'll turn out that actually, if I go between two and 2.2001 or two and 2.... Whichever two points I go between, there will actually be a minimum difference and minimum height I ascend, regardless of which second point I choose. So that's an interesting idea isn't it. And so I think of that as a kind of, think of it as a kind of staircase. Think of it as a kind of staircase. You know, and it's almost a bit like it's almost a bit like, well, you know, here's point two, and here's all the points I can choose. And whether I choose this point, that point, that point or that point, the very least, I need to go up this amount. Ya know, whatever that minimum amount is. Let's say I have to go by naught point five. You know, so if I go up higher, I go more or more and more and more but always have to go up naught point five. That's interesting. So there is this actual notion of kind of it technically isn't the change at a point, it's a change, you're gonna get at least regardless of whichever second point you choose. So that's the kind of that's the kind of conflation sort of smoothing over papering over the thing that we're gonna do. We're just gonna sort of going to pretend that there's a rate of change at a point, because we actually don't technically know in mathematics. But some people will say, oh you know, offhand way, the rate of change at this point. And what they mean is that the minimum rate of change regardless of where you go from here. It turns out, that that minimum turns out to be, it turns out to be the slope of the tangent. In other words, it turns out to be, it turns out that this line here, which is perpendicular to the point that has a slope. So, that green line there has a slope, which is the minimum slope regardless of which second point we choose. And you can kind of see that by looking right. So, you know, let me try and come up with a different color here to make this clear, but, you know, if I look at this, this this tangent line here T, then if I choose this second point over here, that's steeper than T, if I choose a point line over here, that's steeper and steeper and steeper and so on. And the minimum steepness I can get is in fact the tangent. So, this is actually the minimum rate of change, if you like, an any point I choose more than this will be steeper. So this is kind of like a minimum. Right? So it turns out is and interesting geometrical correspondence between the tangent and the rate of change at a point, bye.

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.