Practical Machine Learning
The course is part of this learning path
Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. In this module we’ll start with the basics, introducing you to AI and its history. We’ll discuss the ethics of it, and talk about examples of currently existing AI. We’ll cover Data, statistics and variables, before moving onto notation, supervised and unsupervised learning. Finally, we’ll end off by going into some depth on the theoretical basis for machine learning, model and linear regression, the semantic gap and how we approximate the truth.
- [Upbeat Music]
- Now that we've talked about AI to some degree, and where ML fits into that story, let's go through some examples of machine learning, and maybe after we've seen some examples let's try and start categorizing the different kind of problems machine learning can solve. Right, let's just go through some examples. And when we think about a machine learning problem, because it is kind of just statistics really. We're trying to think through, what thing were trying to estimate right, what's the thing we're trying to predict, or infer, or estimate. So it's a kind of, remember this is kind of inference, right, infer something, and the way of saying that is kind of maybe like estimated. They're not exactly the same, inference has a kind of, it's sort of more like a thinking process or a rule process. So we need to think about what variable we're going to estimate. So that's how we kind of, you know make this, transition between this kind of thinking about modeling, process, da da da, inference, estimation. How to make that concrete, to some degree. Well we choose a variable, basically, and just our number right? But a variable. And this variable we give the symbol Y, that means the thing we're trying to predict or infer. So, maybe we look at predictions for now. Predictions, so, so if we're trying to predict something, we call that thing, Y, a sort of technical word here is target, so that's a pretty important word, so that's gonna be our target, the thing that we want to infer. Now what kind of inferences do we want to make? You know, what value it's gonna have in the future? What range of values it's going to have in the future? How likely it is to see a particular value? You know these are know inferences about this kind of variable, or target. So. Okay so what are these things then? Let me give you some examples. Well one kind of very common one probably, is profit. So maybe we say the thing that we are trying to predict here, our Y, our target, is profit. Okay. That seems fine. I wanna know how profitable I will be in the future. Okay. Alright some more examples. Maybe number of clicks on a website. Or hit rate or something. If I launch this page on my website, can I predict how many people will visit? Sure. Maybe something entirely different, I mean these are kind of business things. Maybe we go for medical. Maybe we say okay let's predict, whether a tumor is gonna be benign. I can never really spell very well, No mind. Or, malignant, malignant, right. Now. We got some examples here, profit, click through rate, tumor, benign, malignant. So. How do I see performance predictions, or estimations? Well I need some basis, some historical basis, on which to connect something, I know, you know? The size of the tumor, whatever. To something I don't know, which is whether the tumor is malignant or benign. Okay so let's just, make that concrete. What I'm basically saying is, we need some features of the problem, which we track. And I call those features X, that's the technical symbol here. And that's the thing that we know. What we always know. These ideas besides a bit, but we know what, you know. I may not know, how profitable I'm going to be in, 2100. But I know I will be a certain age, say. Or, I don't know how much this customer will like my product but I for sure know that they will be, in my shop in London, say. You know. So I know their location, I don't know. There's something I don't know, which is what I'm trying to predict, but there's something I will know. Okay, so what do I do? I look at the history of these, variables. So you know, in the case of, let's go for something, let's go for a, you know a different example even, why not throw an extra one in? We'll go for rating a film. So if I'm gonna rate, if I'm gonna try and predict, a rating you will give to a film. Suppose that's what I'm trying to do. You know, I'm gonna say, what would you give it out of 10? Film rating out of 10. You could say like, dislike, or out of 10. You know maybe we'll do like, dislike. Or no we'll do both examples, We'll do out of 10, then we'll do like, dislike. That will be quite helpful. Right so, if I wanna know, what you would rate the film. What do I need to know? Or what could I know about you, about the film? What things could I know? Well I could know, let's call this X one, the first one. That could be the length of the film. X two, could be the genre of the film. These things are important to how people rate them. Think of something about the customer, about you, or the user, however you wanna put it. That will be let's say, hmm, maybe your gender or your age? Let's go for age. Right so, So this is my problem set-up. I'm gonna know, what I'm always gonna know, the length of the film, the genre of the film, and your age. And I'm gonna try and use that to predict the film rating, that you will give it. So how do I do that? Well, in some sense conceptually it's far simpler than it appears. All I do is I just look at the history, I just look at the history. So, what I do is I already have some data set, which we could call the historical data, let's call it the historical data set for now. So I look at the historical data set, historical data, of film ratings, and film lengths, and film genres, and film ages, typically speaking actually you put the Y at the end of the list, that's good to mention. So that will do for now. So that means, do you see what I mean by that? You could have a little Excel table or something, and you could have all these little columns, and this, you put your Y at the end as I said like convention goes last. Then you look at the persons age, 18, what rating they gave the film, let's say they gave it 8 out 10. What kind of film it was, you know maybe it was 180 minutes, and it was genre two you know, horror. And we just sort of look at patterns that connect. You know connect, X's to Y's. So we look at the historical patterns that connect X's to Y's. And then I replay in the future that same pattern, I reuse that pattern, to give me the prediction when I don't know the Y. So, I say, look at my history, I know your age, your, the genre the film, and the length of the film that you have rated, and I've got all the history of that, and in the future, I kind of estimate what you'll rate it based on how similar it is to things that I've seen basically. So that's you know, that's the essence of a big part of this. Looking at history, making a slight generalization of it, and then using that in the future, to predict something. Okay so that's you know, that's one of the general ideas here. What else do I wanna say? Well you know at this stage we can say a couple of things, can't we? We can say well if we go back to these examples here, in each of, in these cases, Y has a different, plays a different role a little bit. So, in the profit case and in the clicks case, Y could kind of be any number almost, you know, I don't, it can probably be negative, I don't know, But maybe it can be negative. So Y could be, kind of any usually sort of numbers, it could be, minus 100 all the way to plus 100, or higher or lower maybe, loss right? Clicks, likewise, just the same. Tumor though only has two options. You know we can give those options whatever, numbers we like. We could say maybe that they're, if it's malignant, let's say, it's plus one, and if it's benign let's say it's minus one. And there are only those two options. Plus one or minus one. Okay, so in the examples we considered so far, there are different sorts of target variables we can have. Okay. Let's highlight that. And when we're summarizing all of this stuff, that will be an important, thing to be aware of. Let's do one final example, and wrap up one example, and then we'll do the summary. The other example is well, well you know okay so we've talked about prediction, estimation, inference, da da da. What other kinds of inferences are there, that are not necessarily predictive, but that go beyond just the mere data we have seen. So what do we think about? Well, in the case of other kinds of inferences which are not predictive, we don't have a target, there is no target. You might call this sort of, you know, call it inference for now. I'll give you the more technical language in a second. So, you know, there's no target. We only know, things that we always know, You know, in the future we will know them, now we will know them. So you know okay. You know maybe this is, again we could do the example about you know, your age, the genre of the film, change the order of the variables there a little bit, doesn't really matter. Let's say what else did we know? The length of the film. Length of the film. So what can I say about these things, that's sort of inferring stuff about them, but not, not, going beyond what's there, but not making predictions? What could I do? Well one question I could ask, is. How likely is it, that an old person, or a young person, but an old person say, will like a long film? So I could ask for example, what the probability, of the first variable being, high. You know, given, the third variable is low? What's the probability? How likely is it, that someone will be old, and let's say the film will be very long. You know, whatever. Questions of this kind, are about looking at the structure of what we know, and seeing if there's any kind of patterns there. Is it the case that maybe that we see a lot of old people watching long films and young people watching, short films? Is that something we see? And if it is, what does that tell us? What's that tell us? So these sorts of other kind of influential questions that are about the structure, structure. And patterns, you know just mere structure, about the things that we know. Is there any kind of patterns that we have missed, not that we are gonna use to predict something necessarily, but in just what we know, and that can be very helpful as well. So, you know some examples of questions like this is. Well, you know, in our film ratings data set, so let's say we got film ratings, in our data set. Do we see a pattern, a connection, between age, and the length of a film? Is there some connection there? And if there is, maybe that informs a marketing strategy. Is there some connection between the length of the film, that we advertise or we have on our website, and the genre of the film? Is it that horror films are shorter, and blockbuster films are longer? If so, and also we know that young people prefer short films, then maybe that changes which genres we choose on our, front page, our landing page on our website. Maybe we choose different genres based on different ages. So this is informing strategies around marketing, around design, around business strategy. But it's not predicting anything. It's not saying you know, in the future this will happen. But it's giving you information about how things that you know are related to one another. Okay. So let's leave it there. Let's just, let's leave it there, and then let's do a little review of this in just a second. And come and start piecing together more technically, how all this stuff is related together. So we've got things where we're predicting, things where we aren't predicting. In the predictive bits there's differences into how the variable looks. Is it, if any sort of number, is it just some options? And then if we're not predicting stuff, what are we doing? We're sort of asking kind of structural questions, questions of probability, often, in how things are related.
About the Author
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.