The course is part of these learning paths
One of the tools you can use to programme machine learning. In this module, you’ll learn the basics of python when it’s used for machine learning, how to use loops to compute total loss, regressions and classification, and how to setup machine learning in python.
Okay, so let's do a bit of a review of the setup of machine learning in terms of Python. Let's have another classification problem. So, the setup for all supervised learning is this, right. So, at the top here we've got estimators, predictors, all those pieces of the problem are coming together. Loss, features, target, and then the only difference between regression and classification, is whether the thing we're trying to predict is a real number or a label option or a discreet number. So let's just choose a problem of some kind classify a fill. So example, classifying a fill. And then what're we gonna do here? Let's write in English a little bit first, and then convert it into annotation. So given the age of a user and the length of a film, predict whether the user will, or customer, will like or dislike the film, right? So okay. What does that mean? Well it means we've got our X's and our Y. We have two X's here, age and length, so we've got X one, that's gonna be age. And see if that looks all right. Pretty good. So X one is age, let's go for X two is gonna be the length of the film. Like, dislike, that's gonna be our Y, so paste that and get our Y here, and that's gonna be, call it, like question mark, yeah. Get that question mark format blown up. So we've got those options left. So what we need to do, so basically, we're basically saying find a relationship, so find an estimate F hat, which takes in, y'know, X one, X two and gives us a Y hat. Good. Let's have a look, so these are the key variables, let's say we actually say given maybe a data set D. D, and that's just going to be examples of X and Y's above. Dots there . So given this historical data set, given that data set, with things in it, find this relationship, which we could say, where that Y hat is a prediction for Y. So where Y hat here is a good prediction for Y, how do we define good prediction? Well, where the loss is minimum. So when we look at the loss of each point, so we look at the loss comparing our prediction to our observation, we would see that that's been minimized. So where we've got minimum loss. We might want to adjust the tilde notation, but one line can actually just say we've minimized the loss like that. So this is kind of like, just some definitions, our feet definitions feature age and length, and then this is kind of the problem, so we give them some data sets, find an estimated function, gives us an estimate, where the estimate is gonna be minimizing this loss, so the estimates are close. Right okay, so let's set this up. So data set D. Let's do that as a dictionary, right? So let's say that's like, I've got age, length, and like in there, so it's gonna be historical data, so you call it in historical, some call it historical, some call it training. Training to get us that dictionary and to have an age, column, effectively, add some more values, length of the film column, some values, and a Y column, we can call y'know, like question mark. So age, let's have the age of some people, let's go 12, 13, 17, 19, whatever. One two three four, let's do five. And try to align these for the sake of demonstration. So the length of the film, let's have a film length of 90 minutes, 100 minutes, 200 minutes, 180 minutes, and 400 minutes, that seems much, 320 maybe. It's a long film. So did they like it yes or no, yes, no, yes, yes, yes. Say oh well, keep that all set up. Just go with no on that one. Now you might observe here that in programming, and anywhere, really, one, plus one is just the same as the number one, but the reason I'm doing the plus sign here, is I guess it just makes it clearer that when I say a positive, or a plus case, or a positive case, and the symbols have the same length on the screen, and it's just a little easier to see what's going on there. So there's our training data. And that's coming in in a kind of more tabulous style right? So we've got these columns in a table, we don't have X Y et cetera, we've got columns. So let's assume there, and then let's let me do this correspondence, so that would be X one, that would be X two, right? And then we've got Y here is our like column. Okay so the goal is to produce a Y hat column, which is close to this one. So how are we gonna do that? Well we can with an estimated function, so let's just do that. So def F hat for film, I'll just call it F hat for now. F hat, that's gonna take in X one, and X two, so I'm gonna give us a little prediction. So let's do X one is gonna be age, let's say X one, let's go for minus 20. So if I take away 20 from age, it's gonna be negative over here, and positive on this one, so maybe if we take away 17, that'll be zero here, negative here, positive here. So this formula here will be negative, in this part and positive in that part, okay. I'm just gonna sketch for now. If I do X two and if I take away, I don't know, 200, again, we get some of them will be negative, this formula will be negative for some of them, and positive for one, okay. Maybe what we do is we say let's maybe add these togther, see what happens? So we're coming up with some kind of formula, now, the problem with adding these two numbers together is this one's about 10 times bigger than that one, so why don't we divide this by 10 or times that one by 10, let's times this one by 10. That kinda puts them on the same scale. Gives them a same kind of weight, and a same kind of importance to the formula that we're coming up with. So maybe if I return that, now this isn't giving me, this is not quite finished yet, but if I just return that, my point here a film, or an age of 20 and a film of 100, that's a negative number, if I put a film of 200, that's a positive number. And put an age of 10, that's a negative number, if I put an age of 30, that becomes a positive number. You can see there's a kind of reasonableness to the output here, that it's positive when we want it to be kinda positive, and negative when we want it to be kinda negative. So this formula isn't quite finished, because we need plus minus one here, a sign out. So a couple of tricks we can do to get that, we could say is that more than zero, and if we do that, that gives us true and false. So that would be true, that gonna make it here true to mean plus and false to mean minus one. We need to keep going with this a little bit to massage it in Python to get it working properly. But let's leave it giving us true and false. Or you could even perhaps update this definition here to trues and falses. There's nothing wrong with that. So let's have it giving us true and falses output. That seems fine. So let's use this then, so if I'm gonna use F hat on my X length two to compute this Y hat, what can I do? Let's just do on the first entrance of each, just to give you a sense of what's going on here, first in X one, the first in X two gives us a false, so average is wrong, okay. That's pretty bad, we don't have a picture to be long, but I mean, how long is it, we need some definition here, to tell us how wrong we're doing over all. Let's define a loss, and the loss here is going to take in this prediction, so we are Y hat, taking our Y and do something with it. Now the problem here is of course that we don't have real numbers you can't just do mean or square or any of that kinda stuff, so we have to think about what we can do here. Well, we definitely anyway, that if that Y hat is equal to Y, we could just return zero. If they're the same. We just reset, well there's no error. The Y is maybe we just return one, so there's one point of error. And I can't really do this, then we would need to have these be true and false then, let's just update these so they are true and false. First one's gonna be false, true, true, false, true. Okay, that gives us a nice Python way of working. So then that's the loss is going to work, so I'm gonna test out these non-numerical values that give us a numerical loss. So if I run loss on that then, so loss on this prediction, so that's the prediction there, going in. And then the second number here is the actual value for this, which is zero. So that gives us a loss of zero, because they are the same now, that's false, that's false. If I look at the next one in the list, so I could just increase this to one, the next row and in there we get lots of one because when it goes through this comes out to be false, when it should be true. Right, so let's go ahead and compute the loss and the prediction and everything for our entire data set. So we'll do that as we did before. Let's go for for, and we can loop over X one X two now, so how do I loop over them both at the same time? Well to loop over in Python, two things at the same time, so why don't we say A and B will be our X one X two. We can use zip. What zip's going to do is it's going to pair up the entries X one with X two, so you get 12 and 90, 13, and 100, and give us both of those at the same time. So we get A and B, or we can give them a real name, like age and length, that might be much better. Age and length coming out of X one X two at the same time. So we're gonna use those. Let's compute our predictions. So prediction's gonna be call them predictions, call them whatever you like. We could've called them Y hat, but, have I used Y hat, no, so we can call it Y hat. Use Y hat there. Let's put that in the list then, so append, using our prediction F hat, and we're gonna put in here the age and the length, and that's that, so we've got the list of predictions. Oh. A little, restating that. So those are our predictions coming out. And now what we want is the error, or the total loss. So to compute the loss, so we do loss, y'know, there we go. And I'd say loss dot append, and ooh, wow, we've got a couple of things happning here, we could, I think temporarily store this thing here as prediction, and we append the prediction. The reason we would want to do that is because we need to put it into the loss of loss, the loss is gonna be loss, oh I've got to append two things. Let's call these errors, or something. Errors put into errors. We're gonna append the loss, which is this function here, on our Y hat and Y, so if the Y hat here's a prediction, the Y is a genuine Y now. We don't actually have, we would normally have had to answer stuff, so let's put it in here. So if I put Y in here, that's gonna get me out the age, the length, and now the actual true value, call it true, or call it like I suppose? Maybe use the word like again. So that's gonna give us all three at the same time, sounds pretty good. So age, length, and like. So age and length go into that prediction, with the prediction into our record of predictions, into our as we put now, comparison between prediction and what it actually was. So true, true, whether they actually liked it. So we looked everything now, Y hat errors, hopefully, we can see that we're getting an error on some of these entries, but not on all. If I wanna know what the total error is, rather than defining an assumption called total loss, what I can just do is I can take a sum of the errors. And that'll be the total loss. Total loss here of two. Now what we would like to do of course is find some way of tuning this rule so that this total error here is minimized. That's the goal. So maybe we could just do it by hand. So what's happening is, we'll do the visual inspection of what's happening, what's going wrong. We're seeing false, false, true, true, true. And this is false, false, true, true, true, so we've got a problem here. So this age is 13 and the length is 100, so why would we be saying false when it's true? Well they're saying false. Maybe it's to do with the age, maybe the age, or maybe it's to do with, maybe this is too important. So some of these numbers here, 17 and 200 basically, the key numbers, are we oversentsitive to age or something? Hard to say, so let's make a small adjustment. If I said minus 15, and I run through everything. I haven't made a difference. Maybe if I said minus 175, will that make a difference? We're probably making it give errors now, but you can see that if I make this 50, that could, it makes, oh, it says true for everything now, so we're getting the same number of errors each time, but they're making different predictions. And actually, that makes complete sense, and the reason it makes complete sense is because all this rule can really do, it's a very simple rule, all it can really do is it can just basically, just choose a point and go well, below this point, yes and above that point, no. So if you think about that, it's gonna choose somewhere to say okay, these are gonna be dislikes, these are gonna be likes, and if you look at the actual data set we have, within each range, there's a bit of a mixture going on. So it isn't really a single problem, where people who are young like things that people who are old don't like, there's actually a bit of complexity. So that rule that is this simple can't possibly catch that complexity, so we may always have a couple of errors here. So that seems fine. And so, y'know, in practice, what we would like the machine to do is figure out what these numbers should be. Maybe we can suggest to the machine a better rule to use, and then get a better solution. Right, so okay, good, I think that's where I want to be on the sort of Python setup of the machine learning problems. It's giving you a bit of syntax to go along with that kind of white boardy or mathematically, or concptual instruction, just to see what all this looks like in the simple form of Python. So what we have to do next is make this bit more realistic, so we're gonna look at some libraries and other approaches that take these ideas, and make them actually practically usable at scale in realistic problems.
Lectures
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.