Regression and classification
Practical Machine Learning
The course is part of these learning paths
One of the tools you can use to programme machine learning. In this module, you’ll learn the basics of python when it’s used for machine learning, how to use loops to compute total loss, regressions and classification, and how to setup machine learning in python.
Right, so we've kinda had a look at the setup of MLs. Sort of evolving that set up as we need to consider new ideas, so where are we so far? Well, we've got the idea that, yes, we have this prediction variable y. We're trying to estimate it with a y-hat. We've got a data set that allows us to estimate it. You know, we imagine that there could be possibly a real function f. And the goal is to come up with an estimate for that function, f-hat. How are you gonna do that? Well, hopefully that this training dataset sufficiently represents this function. In other words, if I try and compute a y using this function, it would actually be the y that I see in my training dataset. And then you're gonna look at the loss across this dataset and try and minimize this loss by changing this function, so as to get a predicted dataset if you think about it that way. A dataset we would predict, which is as close as possible to this one. Right, so that's sort of the set up. And what we've done here implicitly is actually solved a regression problem. So, I left regression down here as a side issue, but actually if you think about it, what's coming out of this, these functions here is a real number. It's a floating point number, basically. It's a, you know, partial number. So this is regression, so you know. So that's regression. So regression, we've got y is a member of the real numbers. In programming language terms, what we might do is we might say, you know, if I asked the type, if I say, let me just do it in some programming notation. If I asked for the type of y, that's going to tell me that it's a float. So in Python, so if I type this in Python here, that will come out true. So let me show what I mean by that. So if I say type of y, it tells to floating point number. So if I say the type of y is float, that's gonna be true. And that's just sort of, you can think of those two as, they don't mean exactly the same thing but that's, but floating point numbers is how you're gonna represent real numbers in computers. So what I might do is I might get rid of these sort of bullet point numbers here because they might get in the way of a clear visual. And it turns out that in you know, in Jupyter books if you want to, you know, have Python formatted nicely you can actually just type python here and then that's gonna color the syntax there to give you a nice visual of it. So let's just say, let's put, shouldn't play around too much. But you can think of, think of that as a being the same thing there. Right. Back to regression. So regression is where we have the target that or, or be through the result of running our prediction function or the target is a floating point number. Whatever one, with the one's classification, if you recall. So we could, start with a classification here, I suppose. And in this case, you know, y's gonna be some options. So if you have binary classification, what's that? That's where we have the y that we're considering is gonna be, it's gonna be, and you know, let's say minus one or plus one, you know so. Minus one or plus one. So you could see y is minus one, plus one. What else could classification be? We could have multiclass. Multiclass classification. And here, y would be potentially multiple things. So you know, whatever, write zero to something. I mean actually, we don't have to represent this as numbers in the setup. In the setup of the problem, even in the mathematics, we could just dispense with the numbers and say, actually, you know, for a particular problem, maybe it's London. London, Leeds, Manchester, and so on. Manchester, you know, put some dots there. But of course, when we come to do some computation with this, we're gonna need to make sure that, you know, that the multiple classes have a numerical representation so classes require a numerical representation to you know, arrive at, arrive at a computational solution, you know, to solve the problem, right? So here, here we would just say like, y is some option. You know, whether it be zero, one, two, four. Whether it be zero for London, one for Leeds, two for Manchester, whatever it may be, right? So what does this mean in programming terms? Well, let me, let me just show you. So if you know, if y is the result of prediction function, let's give you an example of a prediction function that would be a classification function so if I say def f_classify, let's do a binary one, so let's go for here. Let's go for you know, like, dislike, or something. So you know if we just say eg., Like vs Dislike, on this one, let's, so f_classify, we're gonna classify films, say. So x is gonna be you know, a run time of a, a length of a film or something. So here, we would just say you know, very simple. If the runtime of the film is more than 200 minutes long, then I'm going to return minus one, I don't like it. Otherwise, I'm going to return plus one. And that's a classification rule. Now you could be doing this mathemat, you know in a more mathematical way, possibly. Let's leave it there for just for a second to make the point and let's just try f_classify on a film. So that's my film, it's 180 minutes. Well that gives me a, gives me a plus one, that means I like it. So if that's my y, which I get using my f, what does that mathematical connotation say there above it? Well, this thing here it says, well, you could say the type of it is minus one, plus one, that's, you know, it'd be nice if you could do the same thing here and you know, it's really to know the limitation of Python more than it to do with the programming that we can't say exactly the same thing, unfortunately. Well, what we could do is we could have a set of actual allowed values so we could say classes is gonna be a set, and the way you do sets in Python is just using the braces, so put minus one, plus one. It's not a dictionary, so no colons. It's a set of numbers, minus one, plus one. And so this notation here means that y is gonna be in that set. Aren't they, so that's what that notation means. And in fact, you know. For the, for this when we say, you know, y is a real number or is in the set of real numbers, you know, we could, you know. If we, if I actually had a set of all real numbers, which are like minus, you know, a million point something and you know, minus 100.5, you know zero, 101.2, or something. If I had that set of every possible real number, every possible floating point number, likewise, I could say y in real for the float case, but so. But of course, we don't have such a set around, so in Python the close thing you can say is the type of y is a floating point number. And in mathematics, you know, the way that we can read these, this notation here is, we can actually read it as a type notation, saying the type of y is a real number, or we can read it as a set notation saying y is in the set of real numbers, and in mathematics and in some programming, some perspective on programming, a type is just a set, like all this means, all the word float here means is just, just a set of numbers and all we're saying here is just, it's just one of these numbers. Okay. So we've got, we got classification going on. We got some, an illustration of classification. And we've got regressions, so. The setup for regression, y is a real number. Setup for classification, y is gonna be an option.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.