Practical Machine Learning
The course is part of this learning path
Selecting the right machine learning model will help you find success in your projects. In this module, we’ll discuss how to do so, as well the difference between explanatory and associative approaches, before we end on how to use out-sample performance.
- Lets talk about, explanatory versus, associative approaches. So, explanatory versus associative, approaches. Let's start by looking at what, an association is, statistically speaking. So, an association is just where we have some, some pattern or some connection between two variables. So, here and here is, for example, a data set. Now if we compare that data set here to a random one, let's say we have, let's call this, call this x and y, so we got x one and x two, so if x two, has no connection between, if there's no pattern or no connection between x two and y, then what that means is for every value of x two, we would say, the whole range of y. So, y goes between these two values, that say goes between 10 and 100, So let's say it's maybe, you know, the cost of some product or percentage of some revenue or something, then it's gonna go between 10 and 100 and what we would expect to see if, if there was no connection, is that we would see that whole range for every, for every, variation. So I mean, if this was profit, and there were no connection between profit and where you were, that's a location in terms of a GPS coordinate or something, then we would see the whole range of profit for every, coordinate. Now, maybe profits in this region is more common, So you would see, you know, much more density in the middle, the point is that, you know, regardless of where you take a slice here, the world kind of looks the same, right? So if I, if I look at my, if I, if I look at this region is as common here, as over here, so it's rare here as it is, as over there. So, you know, there's no this kind of just random, right? So, one way of defining random here, is that, you know, two things, one thing is random, if it doesn't inform you provides, you know, information about the other. It's a very good definition of random. So x two, you know, trying to measure the world in terms of x two, things appear random, because x two, doesn't tell you anything about y. So x, y, is just distributed randomly across, across each of those slices, fine. Okay, what about this one? Well, this is highly non random, while we find no x one, I can tell you a lot about what y you're going to have. I know that if x one is low, y will be low, and I can tell you what my prediction would be, as exactly what I think y will be , unlike y, if x one is high, then my prediction will be high, and then that applies, you know, if the, if the trend is in reverse, of course, and you've got a negative trend here, and likewise, if this is low, then I predict y will be high. So, you know, the the existence of that trend, that association, tells you, that this non random connection between x and y. Now, here's a problem, just because, just, just because one thing informs you, about another, does not mean that it causes it. So, let's write that down. You know, maybe x one, informs you, about You know, y, that does not mean, that x one causes y, lets write causes here, no, right? So, we can write this a bit more mathematically, we can say like, you know, the probability that, y has some value, given x one, you know, that's, that's kind of the information, So, so that's the information, and what we might say is that, you know, that this is just different, than the probability of y in general. So, so if we look at the thing above here, like, the probability of getting y at this value is, you know, pretty high, it's quite dense, probability of getting this is pretty low, and you know, that probability, of being in this range, say, is, is unaffected by my choice of x, doesn't really matter where I choose x. Whereas, in this case, if my x is low, you know, there's a certain probability of observing it here, and, you know, for example, within this range, it's, it's high, within this range for y, sorry, within that range, it's low, there's, there's, there's no, there's no observations of y there, so that tells you that, you know, here, the probability is high, their probability is low, and if we change, where we're looking in x, we get a very different result, that here is very low probability of seeing a y, and in the middle, it's where it's high, so that's going to make a big difference. So if the, you know, if the probability, of have, you know, observing some, some y, give, I shouldn't use a comma here, but I'm trying to do it in English, but given, some x, right, if that's not the same as, the probability of just observing some y, then, then x is telling you something about y , x is telling you something. So, so that's so, you know, that, does not imply, does not imply, that x one causes y, and then the arrow just means causes anyway. So, that's the really, that's the really big point here, and let's, let's, let's see more about that. Let me give you an example, so, the rate of sending a text message, let's go for text message, text messages, number of text messages per day between two people, and how much they love each other say, you know, so if there's zero text messages a day and 10 text messages a day, then probably, if we observe some people on a dating website or something, then there will be a connection, between the number of text messages they send, and how much they like each other, or as measured by some questionnaire or something or a number of dates that they have, or something, whatever the standard is, now, so, what is, what is, so that's definitely an association. Right? So, this is an association, because one thing informs you about another, Right? It really is the case of the probability, that two people love each other, given how many text messages they have, is, you know, if you if you know that, then you, then you can, it's not the same as, just talking probability of how much they love each other. So, you can so, it's not the same as the probability, of how much they love each other, because, you know, that is just, well, what you see across this axis, we see the whole range, people, you know, maybe, maybe you, you know that more people love each other, so this, this is maybe you have a high, rate of observation here, and maybe you have a lower rate here, well, you know, high there, low there, is what you would, what you would know, if you didn't have the bottom axis, if you include the bottom axis, you know, that, for example, yeah, it's high here, but the zero there, so, and it's all in this region. So, by including the second axis, by including the axis, you can tell where things are better., whereas here, without the axis by including the information you can't you can't be better. You can't, decide that, right. Okay, so we definitely have an association, so this this is true, we know, we know that, but it's unlikely , so for example, you know, the number of text messages a day, does not cause, love. Now, let me try and give you some intuition for how we can understand that, that means that if I change, if I take a person here, so if I take a person, you know, point, and I tell them to send more text messages on that day, to, to their partner, that isn't going to change, how much they love each other. So, you know, let's make this more formal or more precise in with an example, so if there's a dating website, and we're talking about the probability or the rate at which they will have a second date or third date or fourth date, another date, right? you know, if I tell them just send more messages on the website, to each other, right, if I take a person who's not sending, and who's sending five messages a day, and said go and send 10 messages a day, that isn't going to make, a third or fourth date, more likely, as it's going to have them send more messages, if they're sending five, doubling the number that they send, will have no effect on that. Very, very limited effect. I mean, it isn't the fact they're sending the messages, that causes them to have a second date, it's other factors, such as, the compatibility between two people, etc. So, so those are the factors that need to change, of course, you know, if one person isn't having a second date, because, let's say that a very aggressive person, maybe they're very nasty or mean, then you kind of need to change that, If you change that, then you'd get a second date, not by just sending more messages. In fact, if you send more messages and you were nasty, that would actually probably, make things worse. So, so how can we understand that, then what it means, like action upon, you know, x text messages, causes, you know, y love, to change. and that's, that's the thing there, that if I take this point here, and I and I actually take the same person, I move them, I move them along, so if I move them along here, by telling them to send more messages, what we would just see, is that they would just get moved along, they wouldn't get moved up as well, because, because, that, you know, we've changed the number of text messages they've sent, but they're still not having a second date. So when we observe, when we observe relationships, where we've changed something in the world, so this is not a statistical thing, this is an experimental thing, but when we've changed in the in the world, we would see this relationship completely fall apart. So if we go through, and we take all of the people here, and we, and we move them up this thing here, like that, actually go to them and say ,send out double number of messages, the relationship you would see, would be well, they'll be all over here now, but they'll be exactly the same height. So,we'd have this initial relationship over here, that was that was there, and here we would have, you know, maybe even they would get worse, possibly we'd have some new relationship in the bottom, and would look like that, and, and that, you know, that's that's, so in some sense, the relationship we observed in the first case, there's association between text messages and say, rate of getting a number of dates or something, that relationship we observed, it's kind of like it's a coincidence, right? It's, it's, it's fundamentally coincidental, in the sense , in the sense that things just co occur, it just happen, you know, in some sense, it just happens to be, that when we look at the world through one variable, the lens of just text messages, we see this pattern, but it isn't the case, that actually we can take that variable, change the world using it, and continue to see that pattern, the pattern is going to disappear, because it was never the thing causing, the pattern we observed, it was just that people who, who happened to send more text messages, also happen to like each other more, but not because, not because they were sending more text messages, it was actually kind of a symptom and not the cause. So we kind of have been through, the direction of cause and effect reversed there. So sometimes, you know, we will see patterns that, that are, you know, just coincidental, like have absolutely no connection whatsoever, so, maybe that would be a case of, you know, the number of pirates in the world, and anything which is decreasing, the number of pirates in the world is decreasing, maybe, you can just pick any other variable that is decreasing, over the last 300 years or something like no, I don't know, that's due to plague, right, so plague deaths. So there's gonna be a correlation, between plague deaths and the number of pirates, not because there's any connection between them, or whatever variable is decreasing, so because one's decreasing, the other is decreasing, so you put on the same graph, you give them the right scales and so on, you'll see perhaps some connection. So that's just, pure coincidence. I mean, in the case above, we kind of have, there is some connection, because, because text is just a symptom of, how many people like each other, not the cause. So that's the key thing.
About the Author
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.