Interpreting linear classification models - Part 2
Start course

Regression is a widely used machine learning and statistical tool and it’s important you know how to use it. In this module, we’ll discuss interpreting modes, as well as how to interpret linear classification models.   


Let's consider an image classification example. So here the domain is images, so image classification. We can keep it binary for simplicity So here maybe Y could be cat-dog or again could be categorizing fraud or not fraud in terms of documents or scans of documents or receipts or something. So there's lots of things in mind. Here is an image classification. Now, in the cases where the feature is an image we've got an "X" as an image we're actually going to have quite a lot of weight so if the image lets say is a cat then we've actually got lots of X's Each of these pixels is just one little X.

So this is going to be X1, X2, X3 and therefore we have lots of weights. We can arrange the weights and the biases, but let's keep it for weights now we can arrange the weights in the same grid layout as the image. So we can arrange the weights so that we imagine there's a weight term for every pixel term. And just as the color of the pixel or the intensity of the pixel can be the value of "X" that we use. If its a black and white image, we can say that X1 would be let's say zero if the pixel was completely off one if the pixel was completely on.

And so, in this case the pixel is off in X1 but if this little pixel here, let's say X200 were on Then we could say well it could be a little bit gray and let's say 0.8 So its mostly on. So that's something we could do. Right, likewise we could take the weights and we could interpret their values as intensities too So let suppose that we had suppose that we found out what the correct weight here for classifying this data set could be 0.5 Again we could visualize as mostly on or mostly off.

Now weight in themselves in this case is a little hard to interpret but when you combine them with the data they become easier to interpret. So what do I mean by that? Well, let's take all of the weights and an example image, let's say X1 and perform the actual computation. So this is going to be Sin of that formula and we just take all of the weights in a row and do a dot with all of the X's in a row which just means weight multiplied by X weight multiplied by X weight multiplied by X, then add So if rather then computing the Sin, we just compute this term.

We can see the score that this particular set of weights gives to a particular image. So suppose we take a set of weights and draw them as a grid again We have 0, 1, we have 0.5000000 Suppose we supply a cat image and we find that most of the cat images is zero in these spaces and these spaces let just say hypothetically are the ones that combine with these weights. So let's say this row of weights lets have this as being 0.21 and 0 so let's say this entire row of weights combines with this region in the cat image.

So what does that tell us? It tells us that these weights are not really relevant for this kind of feature, right? Not really relevant for this kind of feature because they are being multiplied by zeros and so the score for this image isn't affected by these weights. Suppose I put a different image in let say of a dog and we look at these weights and we find that they're multiplying actually these pieces of the image. And giving us a score, so when we combine them, when we do the multiplication we find that these combine with this bits and when they combine, we find that they give us some positive value right and then we can go back again and look at the second set of weights, these ones here and we combine it with the cat image and we find that these ones do in fact multiply with the cat image and they give us a negative response.

So how can we interpret the weights here or we can see them as sensitive to this particular regions. so first of all we can interpret weights as sensitive to regions in X and we can interpret them also as looking for kinds of Y Since our X's are mostly zero in an area which multiplies a certain set of weights those weights are not picking up anything in that area the score isn't being modified by these set of weights in this region for this kind of input.

So that tells you that the weights are sensitive to certain regions in X. They combine with those regions and not with others so that's fine and that comes just from the dot product So if I have some weights and do dot X, then in a long row it'll be W1, X1 + W100, X100 and so naturally only W1 is relevant to X1 but since there's some kind of structure to the original image, we can kind of reverse engineer this formula lets say it wasn't a row of weights, let's say it was something else We can say that these four kind of correspond to some interesting bit of the input and perhaps they are, they are sensitive to some pieces of the input and not to some other piece and we can also see they are looking for kinds of Y.

Well why is that? Well this is a cat and that's a dog and since certain weights respond most to cats and what I mean by that is that the dot product is highest when combined with a cat because of these weights and it's lower when combined with the dog Some entries in the weight collection, then we can interpret these weights as looking for a cat or looking for a dog. So let's just summarize what we've talked about there then this is about giving some other additional intuitions behind a model formula or a model setup.

So if you have a linear model that's a model that's going to be composed of weights and biases and it's going to classify things binary case, yes and no the important question about any model, especially a linear model is what all the dials and all the things that can be tuned in that model how do we interpret the things that were tuning? Well, the first layer of the interpretation is to see them as thresholds. So the biases are a kind of hurdle you have to overcome and then the weight dragging that down so you can actually overcome it.

So are you a cat or a dog? Well if I drag the bias up it comes harder to become a cat and if I move the weight up I can drag that down so it makes it easier to come a cat. Right so that's the first sense of thresholds. There's this other visual sense, in which the ways in biases are combined to give you the orientation of the line in your feature space, that's a visual intuition this kind of energetic intuition of can I cross the threshold? Can I cause something to happen? And we've got this other visual intuition about this orientation of this line. And the third kind of intuition is this problem specific one which is okay let me think about what my feature actually is.

So here its an image, let me think about how my weights are combining with that feature to give me a score, okay Now, is there things I can infer about how these weights are working to compute the score if I give it example images. Actually perform a computation with some sample images and then rather than doing this machine and learning stuff I go back and I look at how specifically the weights were combined with example I gave it and then I can interpret those weights as well if it's combining with zeros or very very low values or zeros or near zeros in some kind of images then maybe those weights are not about those images in some sense, they're not capturing something in them right and if they're combining with the same region in a different image and then kind of multiplying with very large values in a different image whether the pixels are very illuminated today then you can say well okay then in some images they're combining with very low values and in other images they're combing with very high values is there some difference in those images? And if there is, if the difference is relevant to your problem and you've got a dog and you've got a cat then you can say well actually maybe what the weights are doing is then looking for that difference so the weights in some sense are looking for the distinction between cat and dog.

Now if the weights are seem not to be picking up on the "genuine" the real distinction if there is one, between your cat and dog pictures that gives you a clue that maybe you have not the model you have is in some sense wrong or not optimal distinguishing cats from dogs based on irrelevant pieces of the input. So we're going to look at this second kind of problem of interpreting weight matrices these grids of weights in the context of networks designed with looking at images. 

So hopefully this gives you some insight however and some method and some sense of the role that a model is playing the role that its parameters are playing and then questions to ask about that what do these parameters mean? How do I interpret what they do for me? How do I interpret how I solve the problem? Because those questions are very important ones for debugging machine learning intuitions. If you have solved a problem with a machine learning approach, you then need to ask a question about whether you're going to generalize how it works and then those kind of questions come down to trying to really reverse engineer what happened and what's going on.


Interpreting Models

Interpreting Linear Classification Models - Part 1

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.