Using loops to compute total loss
Practical Machine Learning
The course is part of this learning path
One of the tools you can use to programme machine learning. In this module, you’ll learn the basics of python when it’s used for machine learning, how to use loops to compute total loss, regressions and classification, and how to setup machine learning in python.
So here we have the loss of each individual point. Let's throw a loop to this, to get that syntax down and use that loop to compute the total loss. So what's that? That's what we go over every point. And we compute a loss for it. Now, what is every point here, though? We haven't really defined a training data set yet. So, maybe let's do that. So, let's add to the set up. I mean, okay, we've got, this is, the y and the x, maybe we could say, okay, we also have a D for data. I don't know, what should we use here? Sometimes people like to use script notation. So what do I mean by script notation? I mean, well, you can see what I'm gonna do here when I run this. Gives you like a curly D. So this sort of curly D, and that curly D just means kind of all the training points that we have. So all of the xs and all of the ys. We could even sort of, we could even put some dots here like that and make it a little bit more, I keep using the wrong key there. There you are, so that's that training data set D is these points now, of course, we have the training data set, we have other candidates that maybe we might have a data set that's, that we use to test what we've done. Maybe we have a data set later on for different kinds of purposes. So, maybe D could be D data set. Do we want to put a little train on here to mean days of training? Maybe, okay, let's see that. So today's set up training is just a set of x and ys. So the dots here just mean there's gonna be more of them. And we could think of each of these points here as just being a point. Now, of course, since I mentioned earlier, that we put a little superscript to mean a point. So let's just do that. So let's say we're at a point in x, and a point in y. So it'd be x zero I Y, it's a y zero I, if you want to do this, there's only one x and any one y. So you might have several. Okay, since things are a bit complicated here, in terms of the just the layout of the notebook, what I've done is I've just tidied this up. So we've gone from bullet points, just to little sections, maybe that could be a little clearer. So in the case of the training data set, which I was mentioning here, we've got the training data set D. And you know, as I was saying, this has one x, one y, maybe in general, we might have multiple xs. So you know, that could be x zero, x underscore one. Which just means next column along, and then we put some dots in again. So, x zero, x one. So these are column positions and this is the row. So this would be the first row, the second row, so maybe do we wanna just continue with that notation a bit, if that's the first row. Let's show you the notation for the second row. So there'll be another row along, and then that'll be the second row, but all the columns would remain the same because they're just the same column. So you got the first row, first column, second column. Second row, first column, second column. Got training data set there. So we'll need that in order to compute the loss on some things. We're gonna compute the loss on this training data set. So let's create, let's just define one. Let's call it D, D train, data set for training. Now, there's gonna be lots of xs and ys so let's just do that. So in this case we only have one x which is just the age so just just majors, so maybe in our sample, we have a 10 year old, which you're gonna fill in, or they're rated at seven. And only do we stick to sort of a more realistic given what we have above, that's just a few points. So we've got a 10 year old 17 year old, we got a 18, 21, 32, 41 and 70. And then maybe we just say that, this is things that we've seen 3.1, 4.2, 5.6, let's go for 5.6 again, out of 10 maybe these are just what people rated, seven and 7.5. So let's compute the predictions for all these, and let's do last column. So to compute the predictions let's just do loop over them and compute the and just get a yhat for each of them. So this is x and y. Just put that as a little comment here, x, y, and so if you go for something in that set, so we can go for point if you want to, for point in Dtrain. And let's just print it out for now, just to show you what comes out of this. So we can see coming out of this loop, then, that we have a point coming out, which is a pair of numbers 10, and three, it turns out in Python, one use a little bit of syntax here is that as you are extracting a point from this list of points here, as you're extracting a pair of values from this list, you can decompose this pair just right here. So let's just wait see what I mean by this. If I say x,y, what can you do? You can put 10 in the x and three in the y. So if I find just do x here, that gives me x there, if put y here, gives me y there. So now in this loop, I've got 10 and three coming out of this training data set. So what am I gonna do with that? Well, I need my yhat right, so as I go around, what I can do is I can get my yhat, what's my yhat? Well, I put into my prediction function for my ratings, and put that in, put my x in and see what it's going to predict for each of them. So let's just print out just to show you what's going on. So these are my predictions. And we'll put in the actual observations, you can see that they're a little bit different. So my predictions tend to be a little bit well too low, I guess, as compared to what my observations are anyway. Now, okay, where are we going here? Let's then compute the loss in this, so we can say, okay, loss is gonna be loss rating on yhat and y. And then let's put the loss here as we print around, there we go, so you got 2.85 point something. So there's the square differences basically, square differences you can see it's four 'cause it's difference of two, two squared is four, and let's record these, so let's record these, rather than just printing them out, let's have a running set called yhat and that'd be an empty list. Let's have a running set of losses, the empty list. Let's call this yhat, let's call this, I don't know yhat for a point, just called a prediction, why not? A prediction and then call this error. So this is a prediction for point and then we'll just append those to the right lists. So yhat gets an append, and it gets an append of the prediction, I guess. Yes, that's yhat and then last gets an append of the error there. And what we gonna get out of these is yhat being a list there of the of the useful values. And then likewise, if I look at loss, it's also a list of those values that we were printing out from before. So, okay, by doing this, we've introduced a loop, going over a data set. We've got lists, lists going on, list of pairs here. And we've got, the use of functions, you've got the use of append with list to add something to a list. So we're building up here, the first kind of programmatic approach to machine learning, and just trying to illustrate the approach with a simple simple bits of Python syntax just so we can do kind of first pass at the machine learning approach. And, and just refresh that Python bit.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.