Start course
1h 32m

To design effective machine learning, you’ll need a firm grasp of the mathematics that support it. This course is part two of the module on maths for machine learning. It focuses on how to use linear regression in multiple dimensions, interpret data structures from the geometrical perspective of linear regression, and discuss how you can use vector subtraction. We’ll finish the course off by discussing how you can use visualized vectors to solve problems in machine learning, and how you can use matrices and multidimensional linear regression. 

Part one of this module can be found here and provides an intro to the mathematics of machine learning, and then explores common functions and useful algebra for machine learning, the quadratic model, logarithms and exponents, linear regression, calculus, and notation.

If you have any feedback relating to this course, please contact us at



Now that we've considered the computer science view which is this calculative view of linear algebra in which we're dealing with a kind of list and some multiplications, and we've considered this sort of mathematical or geometrical picture where we have this dot product. We have a length, and we have an angle. Let's now go a little further in the algebra and think about what happens when we start dealing with lots of vectors at once. And the question of lots of vectors at once is a question of matrices. Let's just spell that well. So, ma-- Dealing with "matrixes" or matrices. All right, so one way of understanding a matrix is just as a series of vectors. So let's just look at that. So if I have, you know, capital X now, it's going to be my dataset. It's going to be the dataset. So rather than just a feature, you know, compare that with just a feature, one user or one example, the dataset, now, is going to have stacks of these X's. So there's gonna be X1, X2, X3 in a stack like that you see. Or we could just write that as a list of lists. X1, X2, X3. All right, so that's going to give us a matrix. So with a matrix, we have two indices, right, So we've got, you know, so if I say one, two, three minus one, minus five, minus six zero, zero, zero dot, dot, dot Then I've this kind of index going down, which is the row, and I've got this index going along which is the column. And by convention, you know, we write the row index first and then the column index. So you could either say r,c and in machine learning sometimes we make the row index a superscript. Sometimes you say row index at the top, and column index at the bottom. So in Python, here, what we might have is, so if this were NumPy, we would have an X matrix, and we would say [r,c] in NumPy and that would be the same regardless of what we wrote in mathematics. This is the Python bit. And the reason we'll put the row on the top in machine learning sometimes by convention is that often, we don't really care about the row in the sense in which we mean sort of, like, for every row. Like we're not filtering by row, we want to deal with every row. So, so what that means is that if I deal with, you know, one column plus another column plus another column, that means here's a formula for dealing with the columns, feature columns or something. And I just mean for every row. And in Python, the way we say every row, so if I just say a column, the way we say "every row" is with a colon. So this is kind of a similar idea that in Python you just say "colon" when you're leaving off the index. And in machine learning, you just don't write it. In mathematic, you just don't write the index. You mean for every index in that way. With some technical notation, you could possibly write, you know, for every row and then for all rows is maybe something you might say or you might see occasionally. But often times, if you leave it off, it's implied. Right, okay. So what do we need to know when we're dealing with matrices? Well, what's starting to get important now is the shape of the data structure we're using. So it's the shape. So sometimes we'll call these "axes of dimensions." In NumPy, we call it "shape," so we'll leave it with shape. The idea behind shape is it's gonna tell you what the data structure how that's how that's laid out. So a vector, X, you know, has shape, some number of columns, and one row. One row, some number of columns. So if it's one row and some number of columns, that's going to be a row-- what we call a row vector. Or it might have shape, some number of rows, one column. We call that a column vector. So it could have either shape. What that means is, you know, i of the vector has one row and some number of columns. So that'd be R is just one there and some number of columns. Or the vector has one column and some number of rows. You might think to yourself, "What is the difference? Why are those two different things?" Well, let's put a pause in that for a second, because they are different. And let's just take a look at the shape of a matrix. And by comparison, with a matrix, of course we've got a shape of some number of rows and some number of columns at the same time. So R number of rows, C number of columns. So in a matrix which has two columns, and let's say three rows, that would be a three by two matrix. So this would be R row one, row two, row three column one, column two. Okay, so let's talk about orientation of the shape, then, and why that's important. So, in other words, which number is it. Is it one comma two, or is it two comma one? And why is that important? Well, it all comes down to how you compute the weighted sum or the dot product. So these words meaning similar things. Weighted sum. I'll give you another word for it in just a second. So if I have some numbers, which are one, three, five, and the shape here is gonna be-- let's do the shape in blue. So the shape of this vector, here, is one row and three columns. And I put that along side a vector which has shape, let's say, three rows and one column, go for two, four, six. And the meaning of this juxtaposition is the dot product between them. So that's what that means. So when you have one comma three and three comma one, it means compute one times two plus three times four plus five times six. So it's important that it has the right shape, because the shape determines which numbers get multiplied together. So because this has one row, that tells you you're gonna go this, then this, then this. So you go across. Because this has three rows, it tells you you're gonna go down for every entry in the first. So one will go with two, three with go with four, five will go with six. And that's always the way we evaluate products in linear algebra. It's always the columns of the first multiplied by the rows of the second. So across the columns, down the rows. So it's important that things have the right shape, because you need to know, well, what number do I multiply by what number when I compute the dot product? Okay, now, what happens with matrices? With matrices, you have what's called "matrix-vector products." And there, suppose I have matrix M, and I have one, two, zero. Let's just do it with one, two. And let's have two, one. And suppose I have a vector X which has six and two. Or six and three for whatever reason, just make the numbers different. Let's look at what MX means. So MX, right, it means the same thing in terms of the weighted sum. Columns then rows. So, so we write in one, two, two, one. And then, of course, six, three. And the evaluation's the same. So columns across, rows down. Columns across, rows down. So, let's do the first let's see if I can do the first one. So one times six. So I'm gonna get blue. One times six plus two times three. That's the first number. And the second number, now, so we have two numbers as a result of this product is going to be two times three plus one times three. And that's the result, so it's equal to this, and this is what? So that's six and six is twelve. Is that right? Yep. And the bottom one here is six and three which is nine. So the result here is twelve, nine. Right, so when we're dealing with-- so when we have a matrix-vector product, we get a vector as a result. And if we have a vector-vector product, we get a single number as a result. And we might call that a scalar or just a single number. So we kind of drop a dimension when we're dealing with products of matrices and vectors go to vectors and vectors and vectors go to just single numbers. Right, okay, so what's the take away of this for machine learning? Well, you know, when we have a dataset, the dataset we're gonna have is gonna be a matrix and not a single vector. So that's going to be capital X. That's the entire dataset. And it's often going to be the case that maybe we want to apply operations to this dataset, and some of those operations can be expressed as matrices. And so we can apply, in one go, a big, interesting operation to our entire dataset to produce, you know, maybe a different kind of dataset. So, an example of such an operation which I'll preview now, but we'll talk about it later on in the course, preview it now, is that maybe if I have a dataset which has lots of, kind of, repeated information in, lots of repeated information there might be a question there of, "Is there something I can do to this dataset, let's call it M, is there anything I can do to this dataset to give me a new dataset where I've got most of it's now zero, and in the first few columns, I've got all this, sort of, caller ID? This caller information of stuff that I need." So this is, what I've done is I've kind of, like, compressed this stuff and put it here. And then maybe if I can get this new matrix out of it by applying these operations, maybe I can just delete that stuff. And if I do that, I've reduced the dimensions of the problem, so I've got less data. I've compressed the data, so applying an operation like this to a matrix can help you perform compression, all right? And, you know, in a simpler case, as well, possibly that when we have a linear regression problem maybe you want to run the model on an entire matrix, and if you do that, that could be understood as a matrix-vector product, as well. So, but we'll leave that there.


Linear Regression in Multiple Dimensions - Interpreting Data Structures from the Geometrical Perspective of Linear Algebra - Vector Subtraction - Using Visualized Vectors to Solve Problems in Machine Learning - Multidimensional Linear Regression Part 1 - Multidimensional Linear Regression Part 2 - Multidimensional Linear Regression Part 3

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.