To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part one of the module on maths for machine learning. It will introduce you to the mathematics of machine learning, before jumping into common functions and useful algebra, the quadratic model, and logarithms and exponents. After this, we’ll move onto linear regression, calculus, and notation, including how to provide a general analysis using notation.
Part two of this module can be found here and covers linear regression in multiple dimensions, interpreting data structures from the geometrical perspective of linear regression, vector subtraction, visualized vectors, matrices, and multidimensional linear regression.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
In this session, I would like to talk about some common functions and useful bits of algebra for our discussion of machine learning. This may be things that you already know, but it could also serve as a helpful review. So the title here is Common functions and some algebra. So when discussing, you know, machine learning, we're often interested in understanding the model that the algorithm produces. So, we can imagine that there is some algorithm. Call it A say, little script A. And that algorithm produces a model, and that model has a mathematical form. For example, in the case of a linear model, so e.g., a linear model, the form is ax plus b. So let's, perhaps start by inspecting a simple linear model, and say it's how it's used in a machine learning context. And then give you some other examples, for the sake of getting some familiarity with this. So, the background of why we're talking about this is because, you know, the use of a modeling machine learning is as a prediction, in a supervised case anyway, this is a prediction for some variable that we want to estimate. So we would say we have a variable we want to estimate, y, which is maybe someone's age. And we come up with this model which we call f hat which gives us an estimate for y, and in this case it's going to be ax plus b. That would be linear model. Let's just say how this function form works, and let me give you some other examples. So in the case of the linear model, there's a few pieces. So we've got the two parameters. So we can call these parameters. Sometimes we may also call them weights. I'll kind of explain where the term weight comes from in a second. But these, you know, are the parameters or the weights of the model. And these are the pieces that the algorithm will fix. So the algorithm produces a model, but how does it produce a model? Well it finds the parameters. So when we have solved a machine learning problem we will have a specific model with some specific parameters. Maybe perhaps a very large number of them, or just perhaps in this case two. So let's have a look at this linear model visually. So visually if I have an input x and an output y hat, which is just f hat of x. Then, you know, I can draw this just with a red line. So on this diagram, the parameter b is where this line intersects the vertical axis. And the a is the slope of the line. So, you know, If I were to take two points in x. So a lower point here, x naught. And a higher point here, x one. And I were to just look at the height that is moved, as I go from x naught to x one. And I look at the width, if you like, that I move going from x naught to x one. Well the height divided by that width, that's going to be a. Or that's going to be slope of that line. 'Kay so sometimes the formula for slope is given as run, sorry rise over run I should say. Which means how much you're going up. Perhaps I should put it vertically, how much you're going up as you sort of run. So rise over run. Or height divided by the distance traveled. Right, that's kinda a bit on a linear model. So a linear model, a simple linear model of one variable has two parameters a and b. A is the slope, b is the intercept. And those parameters will be determined by the algorithm of choice. So for example if I solve that problem. So I'm going to put some historical data points on. And then draw a line through them that's a possible algorithm you might use. Say for example we get a slope of three and an intercept of one. Well that would give us a model. Let's do it in red perhaps. That would give us a model. F of x where we have three x plus one. And we can use that model to perform a prediction. So suppose I want to predict something for 10. Say I've got this point here in x which is 10. I'm gonna go up, and I go across. And to find out what this number here is a prediction, I would do f of 10. So f hat of, no let's do it in green, 10. And that's going to be equal to, well three times 10 is 30 plus one. So three times 10, plus one. Do 10 in green again. We've got there 31, and so this number on the edge here, that's going to be 31. And that serves as our prediction. So you can see hopefully, by varying three and one, we change the prediction. You can think of this green point here, this little arrow, on the side here, as moving up or down this axis. So it could go, it could've been higher, it could've been lower. And how would we move it up or down? Well by changing the orientation of this line. So if we make the line steeper. Keep b the same, but make the line steeper, that gives us a different a. So actually, visually here the prediction might be the same, but it probably wouldn't be. Well this number would change . So if I try to make that, what okay, let me try to make that precise for you. If I go, so here, the thing of interest is 10. So it's this bar here that we're doing a prediction for. Let me just draw a straight line here. This line in x, that's the x input 10. And you can see here that for a lower a we have 31 say, for a higher a, suppose this is four, we would have 41. Yes that'll be 41. So I can change a prediction by changing a. What else can I do? I can also, suppose I say I leave three the same, so it's the same slope, and I move up or I could move down let's say. Suppose I make the intercept zero. Make the intercept zero. So that's sort of parallel to this. Then the prediction here hits at this point and we get a different number here. Which would be, not to scale exactly, but it'd be 30 in this case 'cause we would've set it at one. So this is actually much closer so we get a number that's much closer. But you can see visually what the role of the parameters there is.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.