Module 5 - Regression
Interpreting Models

Regression is a widely used machine learning and statistical tool and it’s important you know how to use it. In this module, we’ll discuss interpreting modes, as well as how to interpret linear classification models.   


Here we're gonna look at the mathematical formula behind solutions to machine learning problem and think about what that formula tells us about the solution to the problem. So let's start with a simple linear model. Linear model. So we could have a linear model for regression or for classification. Let's look at regression first. So this should be familiar to us by now. We have a function model that we are using to predict things. It depends on some weights and a bias term. So this will be weights plus the bias.

When we visualize this, we of course get the equation for straight line. So if there is only one x, then this formula just becomes w x1 plus b. If there are two xs we would have eg, w1 x1 plus w2 x2 plus b and so on. Now each of these weights is gonna be the slope and the b bias is going to be the intercept. So let's look at that. If models in red, then this will be b and the slope will be w. Draw that with a triangle possibly just to indicate that we're looking at the steepness.

Okay, so let's just have a look at this model and what each piece of it is saying. So let's take this formula here, which is the one we have visualized. The w, weight term, is weighting. W1, weights the feature x1. Which means that it interacts with it through multiplication. So this term here, the term with the weight in it, changes with both the selected weight and with the feature. So let's just think through that for a second. If x here is age, location, profit, let's do the feature be profit this time. So from profit, we're trying to predict number of sales that we'll make.

So if my w here is 10%, which means that I saw profit at 10% of volume or something. Okay, let's put some values in. So my bias here, let's say is 10, let's analyze this. If I have 0.1 times x1 plus 10, then this term here, this is the weight here, scales the x1 term. So if I have five, 10, 15, 20 and so on, then if I put five in here, I get 0.5 plus 10 which is 10.5. If I put 10 here in, I get 11 and so on. So the weight term scales or interacts with the feature.

Let's contrast that with the bias term. So the bias term is always the same, regardless of what position we're in. So if we're on five, 10, 15, regardless of where we are in terms of our feature, our bias term is always the same. So let's think through what w means and how it comes to be and what the bias means and how that comes to be. So the w, we can interpret as an importance to the overall targets, importance to calculating the target. So if the target here is 100, it wouldn't be obviously, in this case. So we've got 10.5, so five is kind of 10. So this is sort of 10 over here, ish. Then we can interpret 0.1 as how much, what percentage even, what percentage of the profit is relevant to the sales.

So it's not percentage as such, but it's a kind of importance. And as our percentages can go between zero and one, right? So you can have a two or a three. But it's something you like, the salience or importance of this feature. And if there's a number here at all, in other words, if this feature is important, you could think of this weight as a measure of how much information or how informative approximately, the feature is to the target. So kind of, how much should I be considering profit, when calculating sales?

Now, what's the bias? Well the bias doesn't interact with the feature term, so it's not multiplying it. So you don't put five in and do something to it, so you can't consider it to be something about the data point that you're considering. So the weight is, in some sense, about the data point you're considering. What's the bias term? Not actually about the historical data set, that's the key thing here.

So the bias term, so weight is, in some sense about the observation that you're considering. If I put any observation here, if I think, "Well, "how much profit will I make from 20?" And I put it over here, and I say, "Okay, well this "is the amount I will make." Well, the weight here is, in some sense, interacting with the point I'm considering and telling me which part of that point, mainly the profit part is relevant and how much relevant. The bias term is the same for all of them. So it isn't about the point. It isn't concerned with that piece of data.

So where does the bias term come from? It comes from the historical data set. So that's something about... Well, it's not necessarily about but it's sort of from the data set. What is it about? It's about the history in a way, or the background. So what's the bias telling you? The bias is telling you the threshold sales you will make, right? So even at the zero profit, I will make this number of sales. So if that bias here is 10, then I'm gonna add that another way of thinking about that is, if I make no profit, or I make no profit from 10 sales. So I need to make at least 10 sales before I start profiting. So that's something, right?

So I need to make at least 10 sales. So where does that 10 come from? Well, it comes from historical data set. It's something that is about the background expectation of how much you need to sell in order to profit. And it's not relevant to each individual item, it's relevant to all items. All right, so that's something about interpreting the regression model. So we've seen that before, this linear regression model. Let's now look at a linear classification model.


Interpreting Linear Classification Models - Part 1

Interpreting Linear Classification Models - Part 2


About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.