Introduction to notation
Start course
1h 23m

Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part one of the module on machine learning. It starts with the basics, introducing you to AI and its history. We’ll discuss the ethics of it, and talk about examples of currently existing AI. We’ll cover data, statistics and variables, before moving onto notation and supervised learning.

Part two of this two-part series can be found here, and covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth.

If you have any feedback relating to this course, please contact us at


So in order to start getting a clear picture of machine learning and what kind of problems there are and how to solve them all lots of stuff, I think we should probably at this point introduce some notation and start formalizing what we're talking about and them we can be systematic about what are different options are and the different approaches. So, we've seen some notation already. We've seen that y is what we're gonna call our target variable and that x in general, 1,2,3,4,5, as many xs you need but x will be our feature or features. Now, at the part of a convention, if there are more than one, we will typically give that a capital, X. You'll put this little side-point if you like. But X here will mean multiple columns, basically multiple features. If you're thinking about this in terms of a table or some tabular structure, then all these variables are going to be columned in your table.So, y will be age or something. Whatever you can predict. If we say it's maybe a rating of a film to continue... Then that will be some column in a table and x there will be maybe a thing we know, maybe your age. That will be some column, then X will be multiple columns. If our table was nice and well designed, it only contained things we were using, then the table would just be X, columns and y. So it will be the whole thing. And these is all stuff that we have in our historical data set. Surrounding coordinate historical data set, what I should call it a Training Dataset... So, let's use that as a technical word that's a training, T-R-A-I-N-I-N-G, a Training Dataset. And there's an actual a lot of conventions around the letter we use here. If we use one of those I guess let's call it D. So, let's give it a little script D like that you see.And that would just be all the things that we know. But in terms of our features and the historical examples we have of the thing we're trying to predict. That would be entirely a set will just be You know in SQL-like fashion, the table would be called D and we'd have the coordinates x and y. Why we call it Training, Training just means a kind of inference or analysis. Analysis+Inference it's giving you those rules. It's that process of giving you those rules, those patterns, those connections that solve your problem basically. So we've got our y and our x, X is a sign-point there. What else do we need? So, we need a y hat. And that hat there is the guess that we're going to make for y. So this piece of notation here, this little hat symbol means a guess, an estimate for y. There is one piece of notation that we need on top of all of this, which is the connection between them. The historical connection basically. The connection we're gonna call f, that f is going to be a function. f is going to be this thing where we put in some x, something we know and out of this function, this relationship will give us our estimate for y. So here we have this connection, this f. F that's going to be the relationship between all the x's that we have and y, the historical y that we've seen. So this is to compile this up of machine learning. That give you kind of all notations you need to talk about the different kinds of problems that there are. That's like categorizing problems in machine learning. So, the first distinction that we are going to draw is whether we have in our historical data set, the thing we're trying to predict. Whether we kind of know the answers already. And to make all these concrete, let's talk about the film thing So, maybe the question here is, do we have y? Here is our table, here is our Xs and here is our y, this is the table of data we already have, and this is someone's age, length of the film 200 minutes, the genre of the film, this will be x1, x2, x3 And here's the margin of column of numbers here, I wanna draw them all out, very good let me give you at least two rows of them, 18-year old watched a 200-minute film, maybe a 60-year old watched the same 200-minute film, same genre. And the 18-year old rated it 4/5, the 60-year old rated it 3/5. So if I'm asking a question, do we have the thing we are trying to predict in our data set? Yes, in this case, we do. So, that's one of the supervised learning problem. So, lets say a few things before we move on. What does supervised here mean? It means that in the training step, when we are computing or calculating and figuring out what the relationship between the historical features are, lets make sure we get that word down again, features and the target, when we are analyzing this relationship, we know the answer. We can supervise the machine in it's computation. If you're trying to guess something for y, well, on the first row what you should do is guess four. Because if in the future you see someone with these known features, these known characteristics. If they are 18, they're watching 200-minute film and it's a horror film, genre two say, well, you should be guessing four. So, we can tell a machine what it should be doing because we know the answer already. That's supervision, so supervision is this notion of we have the answer, so we can direct the machine. Now, let's say something about unsupervised, and then let's just summarize this terminology a bit. So what if we don't have this thing? going for an x there, x we don't have it. So we're gonna call that unsupervised. As above we have all the things that we have, x1, x2, x3 but that's it. That's all we've got. What can we do here? We can look for connections and patterns between the things we know but we can't do anything else. So that's known as unsupervised learning. In supervised learning we have the answers in history, unsupervised learning we don't. Now this may seem still a little bit circle or unclear in the sense that maybe you think what determines whether something is a target or a feature. Couldn't I predict the age? Couldn't I predict the genre of the film? And if I was predicting the genre of the film, and I had in my database the genre of the film, wouldn't that be a supervised problem? The answer is yes. So, which thing here is the feature and which thing here is the target isn't defined by the data. That's the key point here. There isn't a database that's comes with one column labeled x and another column labeled y. What defines that is you and in particular the problem. You say I wanna predict genre, well, if you want to predict genre and you have genre, you're in good luck. You can do it. If you want to predict film rating, and you don't have any film ratings well, you're tuft out of luck. There's nothing you could do about that. In order to predict something basically, you need to have seen it before. So there's no magic here. And you're defining the problem, not the data. The data is just doesn't make me meet this criteria, but you're define the problem. So we have the first step here, a key distinction. Do we have a supervised learning problem which we sort of already have the answers we need or don't we have those answers, and then we're looking at unsupervised learning problem.

About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.