Distance Functions and Similarity Measures
Start course
1h 52m

Supervised learning is a core part of machine learning, and something you’ll probably use quite a lot. This course is part two of the module on supervised learning. It takes a look at hyperparameters, distance functions, and similarity measures. We’ll wrap up the module with logistic regression, the method and workflow of machine learning and evaluation, and the train-test split.

Part one of supervised learning can be found here and introduces you to supervised learning and the nearest neighbors algorithm.

If you have any feedback relating to this course, please contact us at


So the distance formula for Manhattan distance, so the L one distance is, you know, it's what you get when you take the vector B away from the vector A. And now we're just just with L1, since the distance of this using L1. And what's that? That's the absolute value of the of the the of the second entries of B minus first entries of A etc. So that would be, you know, B two minus A two plus B one minus A one. And then the absolute value of that. So just give you some intuition that. 

So, you know, what what is the point? What is this point in blue? This point in blue is, well, it's one in the X one and zero in the X two. So that point in blue is one in the X one and zero in the X two. We can pair this with zero zero. So that's, if I just, if I just say, you know, minus zero, zero Let me say that the distance there, so if, instead of comparing this formula in blue with this one up here, cause I needed to spell it out, so you need to do here. So, it would be B two minus A two. Zero, zero minus zero plus B one minus A one. So just ignoring negative signs. Now this is the subtlety in looking all these reforms is that this formula here, corresponds this one so it's this vector minus that vector, involves taking a difference of their components but then adding their difference together. So adding, adding a difference together. Right. 

Okay, so you can now, you can see the blue point here would be one and you can see again that this the point up here will be one for the same reason. They always be X zero and X one and then points along this edge. You have to think about that, you know, think about that. So there'll be a point with semi here and a sentiment there. And, if you take, if you look at this formula, you'll find it comes out to be a distance one once. A certain amount will come from this and a certain amount will come from that. Right. Now, so let's consider the same question in the L two case. So that's where we're asking for points or images or something which are one unit away, but considering the L two distance formula to compute that unit complete that one that single, you know, unit here, just meaning one. 

Okay. Right. So, I'll give you the answer first. So the thing would be one unit away in L two are on a circle around the point. So, let me try and draw a circle, there's a circle. So if I put the axes on. The center of the circle approximately then this will be our point and it turns out that, you know, if that, all the points along this edge and draw A point in in red here B. And then this is our A. Again, all the points along this edge are actually, you know, a rad, at a radius of one, yeah. So this is, this is the radius. So long as the radius here is one, which we will, which will be if this, if we put the radius on the edge here as one edge here as one. We can be the same all the way around. The radius is one. What's the formula for the radius in terms of these two points? That's just the distance formula, L two distance formula. So that's just, you know, in this case wouldn't get ready and maybe. And B minus A here, so that the difference in the vectors, the distance of the distance of that in L two, is gonna be what we've seen before in a B two minus A two squared plus B B one minus A one squared and we can put a square root if you want. 

Right. So you gonna get, we gonna get difference answers depending on what distance function we choose. And there's something interesting about the L one compared to the L two, you know, in the L two case, suppose I rotate the axis, so what does that mean? It means okay, suppose I choose a second pair of axes image to measure things from. So I put an X, let's call this X three and then another perpendicular one here X four and what is it? What does it mean to take two new axes? Well, if these are pixels of the image, if X one X two is pixel, then what maybe X three and X four are they could be particular combinations of the original pixels. So, let me try and give an example about X, of X three and X four, you know, you could consider it to be, what the image will look like under rotation maybe or something like that. So, X three and X four, you could consider taking measurements of points in a rotated image or something like that. 

Now, the key thing here is, that if I have a point A and a point B, and I, I rotate the thing, well, A stays where it is zero. B, B actually changes so I would measure different values of B. So you know because this B rotated. So I, you know, if I were if I were using X three, my measurements would be would be that it now has zero in the X four If I'm reading this correctly. So you can see on this blue axis here that that you have zero in X four for B rotated, but in the original pair of axis, B had some, something X two and something X one. So now we've rotated it here. We we've moved on to the axis and so in the new pair of axis we've got, got zero entry in here. So we've gone from, you know, to make that clear maybe we've gone from a point which had you know, point seven point seven say or something like that. We've gone to a rotated point which has you know one zero. And the original point hasn't moved. 

Now, what we're, so what, so what we see here using L two, is that under this rotation, under this choice of new axis anyway that the distance between A and B is still the same. So if we, if we using the L two distance, then we can do things to our images, you know transform them in some way through the rotation of other things and those transformations don't kind of loose information or destroy information or change the result of the prediction, because if the prediction are just using the distance between them, so if K nearest neighbors is just ranking images by by distance and if it can change these images and have the distances be the same, then then we can, you know perform these kind of changes, right? Now in the case of L one, if I choose those new axes again, then actually the distance between A and B changes. So B goes here and this edge here is a smaller distance in that one. 

So you know, if B goes here, that is longer than that. So it doesn't it doesn't it doesn't stay on the circle. This this sort of operation in moving the axis like this, that sort of operation does not preserve distances if it can be written without one. Now, what's the, what's the take of this? Couple of things as I've said that, okay maybe if we you, if we just use K nearest neighbors with L two, we can do things to images. There's also implications even if we're not changing the images, because you know, under the L two distance, it, you know, it's more likely that an image you know like, it's more like a rotated image. Let's say a house say. There's a house, you know whatever I drew there. And a house which is rotated considered to be the same image. 

So, so these these two images might have a much closer distance in L two than they do in L one. So if we saying, you know, find me a similar image, this might be the similar image to that, but in L one it wouldn't be. And some, some problems you know, you may want these images to be considered distinct, for example letters, you know P and the letter D are not the same letter which makes it somewhat difficult for children who have problems with the symmetry things, to tell difference between letters like these because, you know in ordinary, you know ordinary physical objects are the same object under rotation. So a house is just a house regardless of what perspective you looking at it, but you know peculiarly for whatever reason in language, the visual signs that we use then they're very distinct under rotation. So if you doing handwriting analysis, it might be that L one is a better one. In the analysis of ordinary physical objects it might be the L two is the better one, hard to know. 

So look out on that, I'm just saying, I'm just giving you some background as to why choice of distance functions can be important. And I think the important take away from this isn't particularly the K nearest neighbors are written as such, the important take away is to understand distance formula and their role within machine learning in in making this comparisons between points, because they will reoccur quite often when we're just comparing points. So I know we'll use distance formula over and over again.


An Overview of Supervised Learning - Nearest Neighbours Algorithm - Nearest Neighbours Algorithm for Classification - How the K Nearest Neighbours Algorithm Works - Hyper Parameters - Part 1 - Hyper Parameters - Part 2 - Hyper Parameters - Part 3 - Logistic Regression - The Method and Workflow of Machine Learning, and Evaluation - The Train-Test Split

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.