Interpreting linear classification models - Part 1
Practical Machine Learning
The course is part of these learning paths
Regression is a widely used machine learning and statistical tool and it’s important you know how to use it. In this module, we’ll discuss interpreting modes, as well as how to interpret linear classification models.
Let's talk about, interrupting linear classification models now. So this is the interpretation of linear classification models. Linear classification models. So to remind ourselves, a linear model is one which has a maybe a weight and a bias, a slope, and interceptor say, and both of these are constant numbers. So it's a fixed values and they could multiply and add to they're variable term.
Now with the cluster, this would be a regression model because each of these is just a number and out of here's all the real number. So here X is, say a real number. So our WNB maybe X would be several real, one after another, different variables. So the output of this will be a real number. That's not, what we need for classification model here. If you remember, we need some kind of discreet output. So either plus or minus one. So maybe for this one we could go for a plus one or minus one that be a binary classification. So we could say e.g binary classification.
How do we, so what's one way, of getting a binary output like this from a model? Well, one way of doing it, is just taking the sign of this formula. So if I say S I G N of this thing, what that means is give me plus one if it's positive, give me minus one if it's negative. Just give me the sort of sign of it and then we're done, Then we're done. So the question is then how do we interpret a model like this? Well, let's maybe consider a model of a couple of variables and let's maybe think through an example.
So let's say here we've got, Y is gonna be fraud, not fraud or yes, no, maybe we're making a decision. Y is yes or no and let's make the yes case, the positive one and the no case, the negative one. And what decision are we making? Maybe whether to go, maybe to purchase a product or to go to a shop, some kind to, whether we will enjoy something. So let's think about purchasing a product. Let's say we have X one, maybe it could be the price of the product and X two here could be, our budget or could be how much we like it out of 10.
So how much we've rated it or preference or rating or something like that. So remember in the case of classification, since Y is a discrete thing we can't draw it as a real number line, so we use a color to represent the two cases. So we have two axes here to two actual lines, which is X one and X two. And then we put on here in color the yes, no. So the historical data here, let's say that for low ratings and high price, I've said no. And for, high ratings and high price, I've said maybe a few yeses, maybe only two nos. So for two X two is rating so for a low rating, low price maybe, mostly still no.
So here, if we were to use a linear model, maybe we might say that the line is sort of that line and above the line, we're gonna say no. And below the line we'll say yes. So what we need from this model is for these points here to receive a score. What I mean by score is this term here, that term there is negative four points above this line. And we need this term here to have a, points below this line to be negative. So let's look at how we might do that.
So what's the formula for the whole thing? Well the formula for the whole thing, cause we have two X's, X one X two, let's just write it like this. Is W one, X one plus W two, X two plus B. So we've got two weights and a bias. Two weights and a bias. What role is this playing? Well, so we need this to be negative, on one side and positive another. What does that mean? Well, it means that, well, let's think about this. If I set this whole thing to zero. So if I say W one X one plus W two X two plus B, if I set that to zero, that will be the point where the value where these values are on this line, So why is that? Well, if I increase any of these values alive, go positive or negative, if I set it to zero, the values will be along this line.
So another way of saying that is maybe that what we want is we want W one X one plus W two X two to equal minus B. So let's just put some example values in for W one W two. Let's just think if we can work out from this graph what they would be. Well, looking at the graph, we can kind of sort of roughly get a sense of it. So X two, which is this vertical axis is gonna be equal to some amount of X one plus something. And if you, if you rearrange this formula, you can see that.
So we can see from this formula here that we have, well what is X two? X two is minus B minus B over here. And then minus W one X one to take away this term and then divide everything through by W two. So there's a formula for X two. So, the formula for this line has it's slope. So the slope here is minus W one divided by W two and the intercept is minus B over W two. So if we extend the graph here, we can see what these points represent or what information we're learning. So they're related to they related to where this line hits. Hits these two axes, the X axis and the Y axis. Let's see why that is.
So if I see the vertical axis, if I set X two to zero, that gives me where I hit X one. If I set X two to zero means I put, you know I put a zero in here. If I set X one to zero somethings, they say X one to zero, that means I get the intercept term. Well, the intercept term is this one. So here we have the intercept. And in this case, if I rearrange this formula and I set X two to zero, so here I go down here and I set X two to zero to run along on the horizontal axis, it gives me zero equals well what does it equal? Equals minus B over W two minus W one over W two X one.
So what is X one here? What is this point here? Well, if we just multiply through by W two we get and then add B to both sides, we get minus W one X one equals B, or X one equals minus B over W one, which is just, that's just the same thing we got here where X two would be minus B over w two. And the formula symmetric. So it has the same form for both. You'd expect them to come up with the same answer. So this vertical point here is the bias divided by the negative of the buyers divided by W or the other W one for this one. And this one is minus B over W two.
So how do we interpret, the weight and the bias? Then how do we interpret the two parameters of this model in light of what they mean visually? Well, so what happens, if I decrease the bias? So if I decrease the bias, this increases. So if I decrease the bias, both of these get bigger. So that's just maybe draw that. If I draw it now on a full, set of axes, I've got a line here which is maybe over here, and then I decrease the bias term of that, of this classification model that's going to push both of these up.
So let's maybe draw that in purple like that. And if I well how do I interpret the weights where the weights are kind of dragging it down right there, scaling it down if I increase the weight. So if I increase the weight, so maybe this, well, let's do a little calculation here. So if I say this, if I say bias minus bias divided by weight one will have to be positive numbers, I suppose biases minus two and weight one is a two then this point here minus two divided by minus of minus two divided by two is one and let's say my W two here is this is gonna be minus B divided by minus divided by a W two.
So let's say my W two here is minus two. Well that's gonna come out to minus one. This is minus one. So I've got, so if I increase my bias terms, that's my bias then goes to minus four. What happens? Well, this becomes minus four divided by the same amount. So this doubles just goes to two and this goes to minus two.
Right now let's suppose I increase, my weight to terms. So I increase this to let's say six or eight. Let's increase it to eight. So that's increases term to eight increase that term into minus eight. Now what happens? Maybe let's do it in green. Increase the weights in green. So let's now increase the weights instead and do it in green. So the two goes to minus to eight plus eight and the minus two goes to minus eight. So if we take minus two divided by eight or minus two, divide eight, that is going to be what's minus minus two. So it's plus two. That is a quarter so it's not 0.25. So that goes all the way down to here. And likewise on the other side, we get all the way down to here. And so we scaled it down going all the way to zero in the limit. The green line would be go down here.
So there's this competition between the weight and the bias, this competition. So this competition between the weight and the bias and how do we interpret this competition in light of the classification then? So if we think about what's happening here, the positive side of this line has been classified in with one group. So let's just say it's, in this case it's the no group, but one side of this line gets one color.
Let's make that, we'll make it red on the other side of the line is getting classified differently. That's make it green. And the bias and the weights are orienting this line or they're orienting with line. So, if I decreased weight one, if I decrease weight one, so I move this down here and I increase the bias that was shifted this way and you can see the line rotating. So the line goes like that.
Since I've decreased one have decreased weight one but I've increased the bias so I'm sort of rotating the line. So there's this competition between the weights and the biases that define the orientation of the line. And what the points of intercepting here really give you is a sense of where the classification is going to take place. So above it, one below it another. So the bias and the weight are competing to say well above yes below no or something to that effect. So it's possible to interpret these as like thresholds of classification.
So when you meet the certain threshold above that, you are one category. Below that you are another. And the weights drag that threshold down like weights and the biases push the threshold up so you can interpret the bias as a kind of hurdle that you have to cross. So it's the sort of bias or threshold term. So it's like a hurdle term and the weights drag that hurdle down. So as the weight increase it becomes easier to say yes or something like that. So as you drag the terms around, it becomes easier or harder to classify them.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.