Module 2 - Maths for Machine Learning - Part Two
Linear Regression in Multiple Dimensions
1h 32m

To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part two of the module on maths for machine learning. It focuses on how to use linear regression in multiple dimensions, interpret data structures from the geometrical perspective of linear regression, and discuss how you can use vector subtraction. We’ll finish the course off by discussing how you can use visualized vectors to solve problems in machine learning, and how you can use matrices and multidimensional linear regression. 

Part one of this module can be found here and provides an intro to the mathematics of machine learning, and then explores common functions and useful algebra for machine learning, the quadratic model, logarithms and exponents, linear regression, calculus, and notation.

If you have any feedback relating to this course, please contact us at



Continuing with our theme of linear regression let's start talking about linear regression in multiple dimensions. So, to do that we're going to need to use the techniques of linear algebra. Linear algebra and the goal here, ultimately for this section is to understand linear regression, but now with multiple dimensions. So the model f of x can have lots of different x as inputs the model can be ax1 plus bx2 plus cx3x4x5 and so on. So that's the, we wanna look at models like that and to do that, to generalize linear regressions to multiple dimensions we're going to need we're gonna to need to use the techniques of linear algebra. Okay, so what's linear algebra all about? Well a couple of views on this, one view, a kind of simplified view probably a computer science view maybe, is it's about a data structure a data structure or data structures and operations, in the simplest case simplest case, the data structure we're looking at in the computer science centers either a list as you would say in Python, in mathematics we would say a vector and the operation we would say just in a computer science-ey way maybe, possibly statistically way, it would say a weighted sum. Weighted sum, so these two operate this data structure of a list or a vector under weighted sum, these are the basic building blocks that don't allow us to generalize to higher dimensions and you go well how does that work? Well I'll show you, let's take what we're gonna call the computer science view computer science view initially and then we'll take more of a mathematical view second. So the of the computer science views may be fitting intuitions but we're talking about the same subject, slightly different perspectives. So what's a vector, in the computer science view a vector is just a list of numbers, So however many you want. So here we're gonna say our vector x now will be a vector and to represent it's a vector we'll put a little arrow on the top like that and the reason it's an arrow will become clear when we look at the mathematics, but for now we're just gonna put a little arrow and it's gonna be several numbers rather than one. So what is it gonna be, it's gonna be the first number in the list, so it's gonna be x one, next number x two the next number x three. Say now we could have more than that but one would have three for now. So now I've written it horizontally here which you might think of maybe like a Python notation. Now somewhat awkwardly or somewhat you know a bit tricky here, but the actual mathematical notation is actually to write it vertically so what we'll be doing is we're writing these numbers vertically so it will be like this. x one, x two and x three and let's say let's just choose a particular point here let's say in x one we'll have three, in x two we'll have four and then why not have minus two or something. Okay, so here our x is still going to represent our feature so in machine learning these would just be our features, but now we have a feature vector, so x with a little arrow like that is a feature vector and in general a particular observation a particular example of something, will be a feature vector in machine learning. So let me give you a little example of that so let's say that we have, finance or health or banking or retail, do a little retail example let's try and predict the number of shoppers in the store per day, from what features are we going to use it will be a feature vector this time so it will have several features and the first one will be the location of the store, what city the store is in the latitude longitude of the store let's just go for, let's just say a location let's say the number of products that gets sold so number of products, maybe for sale the number of products for sale and the time that it opens, or how long it's open for that's better one isn't it so opening length, so for example here's a store let's have a local store let's call this store the Tesco or something so x for a Tesco store that's going to be what location let's use a number as a location let's use latitude longitude of something or however we're doing it, this is location three. The number of products for sale in the store let's say there's 231 products available for sale in that store, perhaps quite a big store and opening hours or opening length let's say it's open for 12 hours of the day 7 a.m. to 7 p.m. Now the goal here is to predict arrive at an estimate for y which would take into account all those different features. So you can see how linear algebra is helping already linear algebra is giving us a data structure the vector which kind of compacts everything, puts everything sort of together when the problem is more complicated than having one variable. So, you know in the case of this prediction function suppose we had a prediction function f which took in this vector, we could say well f was going to be you know, maybe it was 10 times x one plus 0.1 times x two plus x three and we can put the little vector symbol on top here but the key thing that we've just done together if you like is use these indices and it's these indices that are gonna give us access to the elements of this data structure. So think of it just like a list then x one if this was Python, x one would be in Python let's say the list was called we can call it x if we wanted to I suppose, maybe write it like this x well actually in Python of course the first element isn't one it's zero so maybe we actually say that's the correspondence. So it's a little bit tricky here mathematicians don't tend to use zero as an index at all so that's quite a computer science convention and then in the area of machine learning it would be I guess more helpful to have zero as the first index, but you know so I guess some people might use zero as the first, some people might use one, in standard mathematic the convention is to start with one. Right okay so you can sort of see how that's going to work and all this little arrow is telling us right is in Python anyway, is it would be telling us the type of x would come out to be list, so that's what it tells us about the type of the variable. Okay, what else so we've got, okay there's the formula let's just plug in that Tesco example into this formula and compute some values so if I do the Tesco example, particular point would be 10 times three plus 0.1 times 231 plus 12 which is 30 plus 23.1 plus 12 which is 53.1 63.1 65.1 Right there we go so that's the prediction so we predict for Tesco that it will have 65 customers per day so it's probably a little underestimated maybe we should have gone for 0.5 of the number of items or something like that, but you can see how this model works. Now okay so by looking at this model we then get an insight into the second piece of linear algebra from the computer science point of view and that second piece is the weighted sum and that's exactly the model that we're dealing with so if you look at this model here how we used this vector or this list, this vector how have we used it, well we've got this sum that's a sum, but it's not just a sum of the pieces so what is a sum of the pieces sum of the pieces would be one, two, three rather than just sum the pieces up what we're going to do is weight each one of them multiply each one of them by some number that will give us a weighted sum or a sum which then a sum sensitive to the importance of it's pieces. So let's just show what I mean by that so let's just choose two simple numbers suppose that one of our features, location, product, opening length well let's say we've got location, product, opening length right so if we had location of three, product of 100 and opening time of 12 hours right if I just add all of those together that's gonna give me 103, 105, 115 right gonna give me 115 that'll be my prediction, now it might be that each of these pieces have a different relevance or a different importance to the prediction of the number of people coming through the door and in fact we might think that in a sense the location, you shouldn't just count the location once, let's count it twice, so let's say two times the location. Okay, maybe the opening hours should be more relevant as well, let's say that's going to be three times the number of opening hours, put three there and so what would this sum be it will be now six, maybe we go with 20, go for 20 be 60 and three times 12 would be 36 and then they've got 100, leave 100 where it is and then we get 160, 196 so it's still a sum but we have weighted each piece. That's gonna be the heart of how we predict a lot of things in machine learning, we use these linear models, models which are straight line models we compute with just weights that are fixed numbers like this, linear models and therefore we use linear algebra which is kind of the system of linear models to do a lot of our work. Now what do we wanna say about this, let's wrap up this perspective this computer science perspective by tidying up this notation a little bit so it turns out we can take these weights which are here 20, one and three and we can put them into a vector of their own and we'll call that vector, I think in this context sometimes a little Greek symbol is used, beta of 20, one and three. Whether we use beta, I think in general in machine learning however, the general symbol possibly used is more like W for weight so we'll call it maybe W. So in the context of a merely linear model, like you're just giving merely linear predictions you might use beta, but the general idea here can be W and probably most people would use W, so we'll use W. W means weight it's just the same so 20, one and three will be our three weights and our feature vector x will be just as we can go for this example here, let's do it in red why not just to make a distinction here between the weights and the feature, so we're doing the feature in red and that will be three, 100 and 12. Now we want a little bit of notation, a little tiny bit of notation would give us the sum without having to write out each piece. So what do we want, well we want sum notation which means the sum of w multiplied by x where we multiply each index of w by the corresponding same index in x, so from i to the number of items in you could even just say like len of w use a bit of Python, I should probably just say n but I don't see any harm in saying even in mathematics there probably wouldn't be much harm in saying length of w probably. So here, that would be a full loop in the Python right so we just say, or we could do it's a bit tricky writing that in Python it's essentially just a four loop. Four, then what you want is your i in range, length of W and you want w with that index multiplied by x with that index and then you compute the total as you go round the loop and the total then, that's gonna be your weighted sum. Right so we'll look at the Python behind some of this in a separate video, but that's a kind of preview. Now on the left hand side of this formula we want some notation to give us that sum straight away so it's going to be w dot x and that dot there means the sum of the components of x so these things here are called components. Here we've got this sum, sum of the components weighted by the components of w. So the technical name for this is a dot product. Product here means multiple, you know a product of two things a product of three and four is 12, so the product, but it's a dot product because it's not just two numbers three times four, it's a list of number or a vector of numbers multiplied by some other vector numbers and this is a very specific way, not just any old way but each component of each multiplied together. Right so you can see then hopefully maybe if we rephrase our model above so we had this model, this f of this x vector, well now we could say instead of f of x then spell it out in full, we could let's put it in green say, since we've innovated the notation we could say, we've got an x vector, still got the x vector now we have these weighed parameters right and we've got w dot x and that's our entire model. That's our nice linear model that's just gonna be the sum of the weights, components of the weights multiplied by the components of x and we're, that's gonna be done there. So there's a few little bits that we need to continue the conversation from a mathematical perspective at least.


Interpreting Data Structures from the Geometrical Perspective of Linear Algebra - Vector Subtraction - Using Visualized Vectors to Solve Problems in Machine Learning - Matrices - Multidimensional Linear Regression Part 1 - Multidimensional Linear Regression Part 2 - Multidimensional Linear Regression Part 3

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.