To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part two of the module on maths for machine learning. It focuses on how to use linear regression in multiple dimensions, interpret data structures from the geometrical perspective of linear regression, and discuss how you can use vector subtraction. We’ll finish the course off by discussing how you can use visualized vectors to solve problems in machine learning, and how you can use matrices and multidimensional linear regression.

Part one of this module can be found here and provides an intro to the mathematics of machine learning, and then explores common functions and useful algebra for machine learning, the quadratic model, logarithms and exponents, linear regression, calculus, and notation.

If you have any feedback relating to this course, please contact us at support@cloudacademy.com.

So let's now consider an example of when this visual way of thinking, or this geometrical way of thinking about vectors, can help us resolve some problems in machine learning. So one problem is how close are two feature vectors right? So what does that mean in terms of machine learning? Means how similar are some observations, some may be, observations or examples or cases is always been the same thing. So suppose I had a film, here's a film x1 is a film, that's could be a feature vector, it's gonna be a film of 180 minutes that has been rated seven out of 10. And another film x2 which is gonna be 200 minutes, being rated nine out of 10, and questions how similar are these films? Well one thing we can do, lets will for the several of potential answers to that question. But you can kind of see here that if I draw one on visually, so if the horizontal is minutes let's say there's 200 over here, the verticals rating that says 10 over there. So we've got kind of one back here that's x2, one here say 180 is, let's make it a bit quick for a bit differently and 180 is there and seven is about there, so let's say it's taratataa, but there say x1. One thing we can do is, what's the angle between these two vectors? What's that angle? Another thing we could maybe say is well, what's the distance between them? So it'll be the subtraction, so we talk by the subtraction of the kind of similarity, it'll be some formula we could use. We could talk about the angle, let's talk about the angle, its about the angle and the angle here using the angle to judge similarity is known as cosine similarity. I'll explain why that is once you see the formula for it. The basic we do is you imagine that one is of user, one film, or one product, or one item whatever it may be in your problem. You imagine one of those is a vector, another is a vector and then you think well okay, what would the angle between them? And that can be used as a rank. Okay, so the formula for the cosine or the formula for the angle between them we should say is, to write it the way it's sum, its actually, so the the cosine of the angle is actually it's the weighted sum which is x.x, x1.x2 but it's not equal to the weighted sum. You have to scale it, you have to scale it. So it's if we take if we imagine that we didn't have, it didn't go between 200, let's say out of 200 and out of 10, but rather it went from zero to one, so we've scaled them down so the lengths of these vectors just have one. So in other words we're trying to thinking of scaling these so everything's sort of scale to a unit length of one, then the sum between those components is the cosine. So what we do is we divide by the length of the first vector multiplied by the length of the second vector on the way of putting that is, it's the first vector divided by its total length, multiplied by the second vector divided by its total length. And the dividing by its sort of present length, what that does is it gives it a unit length length of one, and makes them have the kind of the same relative scale. Okay, so we need to, I mean to go into little bit more detail, because what do we mean by a length? What do I mean by angle? So what we kind of what introducing here is the notions of length and angle, which allows us to say something about how similar things are. So what is the length of the vector? Some people use two bars here to mean the length of a vector but it can be a bit cluttered, so what is the length? So if I have a vector, you see my vector here is, let's go for a film which has a 100 minutes, so now our fish and the rating of five so what's the length of that vector? Well it's a 100 in the horizontal and it's five in the vertical. Now of course if the axes have both the same scale, it'll be, it would look like that five and then 100 or something but we're just zooming in on the vertical and even the horizontal the same scale so there's five. Now what's the length of this vector? Well maybe you recall from school that this edge here is the hypotenuse of a triangle and here's the base and there's the height. So your base and height and what everyone call it, and that the edge there the diagonal edge is the height squared plus the base squared square rooted. So another way of saying that is, in school sometimes they say, what do they say it to be? A squared plus b squared equals c squared, so that would be a case where you name the base b, the height of it a and then the c, where our vector of interest is the hypotenuse. So the squares are equal, so only the squares are equal so if we want the actual distance, you want the rule of c and you take the square root right? So what does that mean? Well it means that if we consider the in a vector, if we consider the horizontal position to be the, that's right that will be the base, the different words here but if you think about the think about this as the base, and the height of a triangle, then the length we're gonna define the length of a vector to be the square root of the sum of its components, which gives you this kind of diagonal sense of direction. Or I mean it's just this it's just the distance formula, in two dimensions, ordinary distance formula. So okay so what does that mean? It means that, so that gives us a way of defining this thing here then, so what is the length of x1? Well it's just its first component so in the case above here we had a 180 and seven. So it's just one 180 squared plus seven squared and that's going to be more, has to be very close to 180, so we're very close 180 squared and you can see why, that is because if you if you didn't have seven here, the square root would just cancel this give you 180, so we're gonna get approximately 180. But it'll be a bit higher than a 180 because we've got seven squared, so if we actually just compute that, we can see that a 180, well we probably would have square here, squared right? That's 32400 and I forgot seven squared is 49, so we can add to that 32,400, that's almost that number. Now I get the square root which maybe isn't on this calculator so this way of taking the square root isn't here, but I would prefer that it just was, yes that's I got it, sorry 32400 plus 49 there we go and then square root is 180.13. You can see it's very close, so it's approximately equal to 180.13 the better right. Okay settling of that vector and the length of the second vector is gonna be almost all that 200 as well. So length of number two is gonna be 200 squared, plus the rating which wasn't seven it was nine squared and it's gonna be book to me equal to 200.2. Right okay, so if I want to know what this angle here is then what I wanna do is like I take each component of x, I do a dot with each of the components of x, and then I do this division, so what are we dividing by, we're dividing by x1 multiplied by x2 the lengths. 180 times 200 terere so let's just do, so its 180 let's just do times 200.2, it's just estimators can do 0.2, is that what I want to do here? Probably 180 times 200 is 36,000. Okay, let's say let's just say it's simply 36.. lets just say so it's approximately equal to 36,000 right? Now what do we go with the x1 times x2 but on the top, so the top we've got x1.x2 which is if you recall is a weighted sum. So the components of x1 which are a 180, and then the components of x2 which are 200.. components of 180 and seven, components of x2 are 200 and nine. So we've got these things here that's gonna be 36,000 again, plus seven times nine which is 49 I think, seven times nine 63, think he was sounds great again, 63 so it's 36,000 and 63, so actually we don't need to be quite precise here, because these numbers gonna be very small. So it's gonna be 36,063 and then we'll be dividing by well 36,000 and 63, you can see how we divided by a number which is almost what the weighted sum is, so I think it was a number which is very very close to zero, so 36,063 point something which is gonna be very close to zero, so let's say it's 0.0 sorry no not very to zero. Leave them alone with some sense, leave them very close to one, so let's say it's 0.998. That's just okay, so what does that tell us? It tells us the cosine is equal to that, so cosine of theta is equal to 0.998. So if we want the angle, we would need that the inverse of the cosine, so we need to figure out what angle give us close to one. And if you know the shape of the cosine function, cosine function starts at one, and does this, so what and so if this is the angle here, the angle theta. And so what that's telling you is that if the output is essentially one, then there's no separation between them or very very close separation between them. So let's say maybe the it's a little little far away from one, let's say the angle is five degrees, so what that telling you is, with these movies considering the the length and the rating, that they're very very similar. So there's you know there's very close terms. So there's very tiny variation and that's a side effect of the fact that they're basically the same length. Because as you we've gone through this analysis you can see that this term here has barely had any impact on the, on considering the distance between two summary of these two vectors. So it might be that if we have very different magnitudes of components, if you have a very different scale for the length of the film that we'd have from the rating the film, it might be then actually the length is gonna mostly determine whether they're similar. And that might not be a very good system for determining some logic there, because the length is gonna kind of swamp in differences of ratings. So one of things we could do is before we start analysis we could say well that's not a measure length in hundreds of minutes, let's measure it in hours, and then lengths in hours would be on the same scale as the rating, and so we would get a very different picture. So rather than having 180 minutes, we could have three hours and rating of seven, and then that 397 may be very different to two and nine. And in this case and this is only a difference of one hour here, this is the difference of two points in the rating. And so the angle is gonna be much bigger, and so but they will look less similar than they did before. So okay, let's just wrap up where we are with this one. What does this analysis given us? It has given us a geometrical picture for vectors. It has given us a geometrical picture for vectors, and using that picture we can start coming up with this interesting formula for computing things like angles you go well, how can there be an angle between films well in this way? We imagine a film in a vector, imagine another film is a vector then we have this formula, this cosine formula for the angle between two vectors. That's helpful in machine learning that gives us a way of computing a similarity okay? Why would we want to know a similarity? Well maybe for recommendation, maybe when we're recommending CAM to people, what we do is we rank them by how similar they are, according to this formula and then that's what we suggest right, that is where we give to the user. Let's rank let's compare one film to every other film ranked by similarity, you like x1 so I think you'll like x2 because it's the most similar. So that's kind of how sometimes a visual or geometrical understanding of the problem can really give us some and to how we must solve it.

### Lectures

Linear Regression in Multiple Dimensions - Interpreting Data Structures from the Geometrical Perspective of Linear Algebra - Vector Subtraction - Matrices - Multidimensional Linear Regression Part 1 - Multidimensional Linear Regression Part 2 - Multidimensional Linear Regression Part 3

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.