Practical Machine Learning
The course is part of this learning path
Selecting the right machine learning model will help you find success in your projects. In this module, we’ll discuss how to do so, as well the difference between explanatory and associative approaches, before we end on how to use out-sample performance.
- So here's the takeaway for machine learning and that's if you use merely associative connections, then several bad things might happen. One is, that the environment in the future might change. So people might start sending more text messages, just coincidentally, maybe the platform makes it easier to send messages and therefore everyone's out sending more messages. And if the environment changes the words that they actually, the connection between the symptom or the coincidence, the connection between the coincidence and the target, if that weakens, because the real cause is changing, then your whole association will disappear or change in ways that are deeply pathological for your prediction. So if you go from world one to world two, then you can see that pattern just breaking down completely. And if you were to keep trying to make money on your site and you were to use your, you thought it was like this, and it was actually completely different then that's catastrophe. So this point here of breakage, or point of failure, we can call it failure. That's just when the environment's changing. That's caused by, you know, we're seeing this because the genuine causes has changed. Now, well, I'm not saying well, no, no, no, no. So that point of failure can be caused by, or can be accounted for in a couple of ways. One, the cause might change. So the symptoms changing, or just that, like, we've just made modifications to the text messaging. So like sort of the environment is different in the sense of, the people like each other in different ways. It's just that text messaging is different. So some other, so we've changed one variable, but because it's not connected to the other, that association is breaking down. So it could be a change in text messaging or a change in the real cause. And that's gonna break our model completely. I mean, if the real cost changes we'd have trouble regardless of what we do in any case. So, what does that mean? That means that if we're using associated modeling, we're possibly in a very, very, very tricky position when we come to deploy this, because if those associations break down the whole thing failed catastrophically. So we have to hope that, those associations don't break down. Now, the good thing about an explanation or explanatory modeling, that's modeling where you, where the thing you're tracking the association you've got is genuinely causal. Like it couldn't be otherwise, like, you know, it couldn't be otherwise, that text, that changing person succession rate wouldn't change the number of dates because text messaging causes number of dates. If you change one, you have to change the other. So causal laws are not mere coincidences, they are laws meaning they are universal, which means that if you have explained something properly and you have the right connections between things, then your observations will hold, well, universally but there are, I mean, there are some conditions on that, but more or less, they're very, very, very strong connections between things. They, you know, if you, if I measure how much gas is coming out of my cooker and I'm measuring the temperature of the pan, how could it be otherwise, if I turn the amount of gas going down, the temperature of the pan wouldn't also decrease. Now, of course, there are some obviously conditions we have to make sure we're talking about the same problem and the gas is lit and all of that sort of stuff. But given that the environment is the same, if I'm moving these variables around and they are genuinely causal, that it must be if I change one, the other changes. So in an explanatory case, the things we're observing really do, you know, they can't be otherwise. And so in a really sort of pretty guaranteed in some ways to get a good, get good performance when we come to deploy this thing, because the things we're tracking really can't, you know, that association company break down very easily and break down because our machine breaks down a lot of incidental things that might happen to cause I think to fail, but the association is pretty rigid. Now, so that's one thing. So, if we have an associative model, then those can fail to generalize very easily. So that's fail, fail to generalize quite easily because the, in the generalization case and the out sample case, the association can break down in lots of ways. It's not just that you can, the machine can fail or the environment can change, but the thing we're looking at can change quite easily without affecting our target. So, they can fail to generate, you know, the required monitoring and making sure that we're okay. Two, let's talk about, the other, another, the other takeaway of this. The other takeaway of this, is that an associative model, a merely associative model, one that definitely isn't causal, or almost certainly isn't causal. It can be very hard to explain. Very, very hard to explain. So, associative model so what should we do here is, is say, one, they fail to generalize and can filter, that's part of the, the tricky part of them and two they are hard to explain compared to the generally genuinely causal models or explanatory models we might call them. So, why is that? Why is it hard to explain the associative model? So if you go well, so if you ask someone why, you know, why is changing the number of text messages, why do we observe a pattern between the number of text messages send and the number of dates people go on? Will be, can explain that, so we can offer an intuitive explanation. We can say, actually people who like each other, they communicate more often and so on. But that's a hypothesis that comes from outside of our data sets. So we have to think about all the ways this pattern might emerge. So we have to kind of introduce that and it can be justified by our data set. So it's not that we know, that text messaging doesn't cause the number of dates. So we can't use this data set to provide justification for that connection, because we know that there isn't the connection. So we have to come with this independent research to say, actually, yes, people who communicate more, who liked you the more communicate more often, and we need that independent research to do that. Let's maybe think about an example, which is even worse than that, where it comes kind of impenetrable. If we're using images, and we've got some predictions, we're making that say, you know, we've got images of, of receipts maybe and bills, and we're putting those in and we're getting out of it. I don't know. And we're getting out of it, how much of profit we would make on that sale? You think well, okay, interesting problem. We're taking an image in, what we're hoping the computer is doing in some sense is be looking at the prices of the products, maybe somehow, you know, maybe even looking at the names of the products, if not prices and then going what we know and corroborating that somehow. So, you know, different kinds of images go in and we get different amounts of profit coming out. So let's just go image index i, so rather, so this is just which image it is. If it's image zero to image one million, then each image is gonna have a certain amount profit coming out, so it's sending out a profit, well, there's only one point per image actually. So let's imagine some connection between these anyway. Now, if someone, some regulator comes along and says explain how this algorithm is working, explain how at least our model works, if not, how we've arrived at it. You know, the problem is that every image can be, thousands of pixels, which pixels are the important ones. It's really hard to say, well, it's using the names or it's using the prices or it's doing something different, very hard to account for that. I mean and certainly probably the ways that you will account for it, are very likely to be wrong, even if you do good job of it. I mean, if I've got an image, and if it's looking at a cat say, and we can say for sure that it's looking at the whiskers, and it's comparing that to another image lets say a dog that doesn't have whiskers, you know, longer hears, whatever knows whatever, it doesn't have whiskers. And we can say, well, actually it, whether there's whiskers in the image or not is where it is, what it's doing here. But maybe if I just take some straws and I put them here, on top of the dog's face, maybe it would say, it's a cat as well. So is it whiskers or is it just some visual structure in some place that can quite easily be fooled? And, you know, so, human beings, because we deal with explanatory models, we operate with explanatory models, when we navigate an environment, we kind of need to know what causes what, in order to actually modify the environment in any way. So when we look at images, we immediately import, explanatory information into them. So when we look at visual information of a cat, say, we're immediately applying our model and our explanatory model of how cats work and how they behave and what they are like. And we see whiskers immediately, but the machine has no notion of what a whisker is. It has only the pixels. So the explanation really has to be in terms of the color intensity location of pixels. You can't, you know, if you say whisker, you're immediately thinking that somehow the machine has detected, a piece of the environment somehow, and really it's sort of just looked at patterns and pixels, and it'd be therefore easy to fool the machine just by reproducing a pattern that were obviously to us weren't whiskers, but the machine, because it's just dealing with this very thin amount of information, it, there is no whisker. And so that isn't really how it's working, So, okay, great. What's up then? So they're hard to explain. So they can fail to generalize and it can be very hard to explain. And in, so with the second one, we're thinking in terms of, regulation, perhaps legal regulation. If we're saying someone committed fraud, can we explain why we believe that? And that can't be well because the machine said so, because it was very easy for the machine to be fed data that is unhelpful and pick up on patterns that are coincidental. And so come up with conclusions that are really very, very poor, perhaps, perfectly incorrect. So, we can't just say, because the machine said so, we have to provide some account or with an associative model, you know, a merely associative model, a model that has no explanatory, no explanation justification behind it, where no we can't do that. We can't do that. So what's the impact on all this? The impact in all this is that some models or some algorithms I should say, some algorithms are produced models that are hard to explain. So for example, neural networks produce essentially very hard to explain models, mainly, associative models, neural networks, typically speaking. Forms of general regression can do that. You know what algorithms are easy to explain? Well, we've seen Kenya as neighbors that can be quite easy to explain. I mean, it's just that, we looked at a database to find the person who was most similar to you. We chose that person and that's why we said you're not allowed to learn because the most similar one in our history, and if a regulator isn't persuaded by that argument, will then, you know, isn't persuaded that, that's a reasonable way of classifying people into, whether they're reputable, creditable or not say, then you can't use that algorithm. So, but at least you can explain it. You can actually provide so you can, and you can make a case behind it. So when another algorithm, which is widely regarded as one of the most explanatory algorithms, is a decision tree. So that a decision tree can still be, it can still be an associative model in the sense that the variables are not causally connected, but a decision tree, even when the variable is not causally connected, it can be easy to explain how it's arrived at it's decision than some other algorithms can be.
About the Author
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.