Module 0 - What is Machine Learning? - Part Two
Unsupervised Learning
1h 30m

Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part two of the module on machine learning. It covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth. 

Part one of this two-part series can be found here, and covers the history and ethics of AI, data, statistics and variables, notation, and supervised learning.

If you have any feedback relating to this course, please contact us at


We've had a look at machine learning in general, within the history of AI. We've had a look at one component of machine learning, supervised learning. Now let's look at the second major component of machine learning, unsupervised learning. So this is about unsupervised, a little bit thin that pencil, unsupervised machine learning. So this is a way of characterizing machine learning problems. And if you recall, in the unsupervised case, we're missing the thing that perhaps we would want to predict. Maybe more precisely, an unsupervised machine learning problem is not a prediction problem. There is no, this thing we have called the target, is often around, we don't have that, there is no target. So we're not looking to predict things. We're looking to, in fact, uncover relationships in known features. That's sort of what we're looking for. So let's give you some examples of some unsupervised machine learning problems then. Let's take a couple of features from a problem we haven't considered. Just for a change of pace. Maybe, something to do with sleep maybe. Let's try and distinguish different kinds of sleep patterns. So, what could we have here? We could have as our x1 we could have our heart rate maybe, that we maybe measure with a watch like this. What else could we observe in sleep? Maybe the amount of movement that a person has. So, we can call it restlessness. Rest-less-ness, right? Heart rate is gonna go from zero and 120, say. And restlessness, let's say, I don't know what we could describe as restlessness, but maybe we can video tape someone and look at the number of times they move. We could put a sensor in their bed, the number of times that sensor is activated. We could even tie it to heart rate, but that would be a bit complicated. So let's just think about it in terms of how much movement, and put a little sensor in the bed, how much movement there has been at a particular time. So let's say restlessness is a percentage, zero to 100, I think maybe 100% restlessness, is where a person appears to be completely awake, the beds kind of moving around a bit, and zero is when they're perfectly still. So not even, maybe not even a little move of the hand or something, not even a little shrug of any kind. So, okay, we don't wanna predict anything here. But we wanna maybe think about a relationship between heart rate and restlessness, and then come up with what that could be. So let's draw that. So, these are just gonna be real numbers, so we we can just use a straight line on the axes. There's an axis. There's gonna be heart rate down here, and we can put restlessness on top there. And let's look at some historical observations. So, high heart rate, high restlessness, top corner there. Let's say we've observed some people that maybe we say this is average heart rate over the night, so some people draw a bar like that, this is gonna be average heart rate across the night. Some people have a very high heart rate, average heart rate across the entire night. Some people have a large amount of restlessness. That's what we see in our database. Some people have a low very average heart rate, maybe not zero, so whatever. If this is zero and this is 120, maybe 30 possibly, very, very low, but in sleep, possibly I guess some people might sleep at 30. Let's say they have very low restlessness, very low heart rate, almost no restlessness in the night, and low heart rate. Of course for most people, we're gonna see some different kinds of sleeping patterns all over the place. Now, what you can see from this diagram, hopefully, is that it isn't just sort of random. It isn't that, if I have a heart rate, and I have restlessness, it's not that, for every kind of restlessness, I can see every kind of heart rate, and for every kind of heart rate, I can see ever kind of restlessness everywhere. But compared to that graph there at the bottom right, the one we actually see in our dataset has clustering, which means that there is a relationship there is a relationship between heart rate and restlessness, and a sort of grouping relationship, where they come together. They come together in certain ways. So, maybe from this diagram, we can see that maybe there's this sort of group of people with a high heart rate, and not so much restlessness in their sleep. And then down here there's a very different kind of person, low rate and low restlessness, and then maybe some more kinds of groups that are more diffuse. Then we can consider perhaps this is one big yellow group, and maybe that's all one group, and we just see there's kind of dots there. To what degree these are in the same group is somewhat arbitrary, actually. But what we're seeing is some level of grouping. And in fact, what I've just done there, when I applied those colors, is I've solved, I've solved the machine learning problem. So the problem here is not like before, where we're looking for an f, just one line. But here we're looking for a more possibly complicated relationship. And you can think of it as a coloring problem. Or you could think of it as, what we're doing here is we're asking the machine to give us a group, a possible group, that every point is in. So, the machine could maybe give us some guidance to whether those groups here, how many there are, and which points would be in which group. So those are the kinds of questions we can answer with techniques of this kind. You could think of the relationship here between the group, and between the heart rate and the restlessness, that group, or that color, that is still a function, still a relationship. You put in a heart rate, and you put in a restlessness, and out comes a label, a color, or a group. So there's still a relationship. Okay, what's the use of this? Why would you want to do this? Well, since we don't know what each of these groups are, the graph didn't come already colored. And if it did, that would mean that we knew what groups people were, and we would probably know what those groups meant. In this case, the graph did not come partitioned, did not come colored or labeled, all other things being the same. We can infer a grouping. And maybe now that we know what the groups are, maybe we can start some investigations, some additional experiments or research, or we know because we're experts in the domain, what these groups represent. So maybe we know that in our dataset we have some athletes. And those are the athletes. We know we have some people that have a sleeping disorder of some kind, you know? And then maybe we have even groups of interest, really, because maybe we're launching a product for people who maybe don't have a sleep disorder, and are not absolutely amazing at rest, but maybe we're launching a product for people in between those two groups, different kinds. And so this ability to disentangle different kinds of persons here, could be very helpful for us, because we can say there is a group here, I wonder what explains the fact that these are all a bit like each other? I wonder what explains why these are alike? What's going on there? So let's think about that. So, this yellow group here is all over the place, in terms of their restlessness, well, I don't know. Maybe we think this heart rate here could be, let's say that's 40, this heart rate here could be, let's say 60, I don't know. And then in terms of the restlessness, we're going from well, 50% to 100%. 100% restlessness is a really bad sleep, it means they never rested. So maybe this is a group of people, maybe they've drunk a lot of alcohol. Maybe they have not exercised a lot. I don't know. Maybe this orange group is a more traditional sleeping group that actually people are below 50% restlessness, and much below that they slept well enough. And then they have this range of heart rates caused by possibly, possibly these are female people and these ones are male, 'cause I think men have a lower heart rate I believe. So that could be an explanation as to how these sorts of sub-clusterings. So these questions of this kind can begin with this coloring operation, this grouping operation. And then you try and explain these groups, and that gives you some insight into what's going on with the data. So that's interesting. And so this problem here, this problem here, is known as clustering. So I gave you a little thing here, but let me put it in a big words. This is called clustering. And this is one sort of unsupervised machine learning problem. It's one of the big ones. The other sorts of them are really just very disconnected techniques. There's not much relation between them. They're sort of little tools and techniques that might be of help in the area of trying to build sort of, you know, intelligent systems. But there's no real deep connection, other than we think of them as being part of this discipline, and they're not predictive. Maybe I'll give you a very, very quick idea of one additional problem here. And that's the problem of compression. So here's another unsupervised machine learning problem called compression. It's also given the more fancy name of dimensionality reduction. Dimensionality reduction. And we will talk about his in detail. But for now, I'll give you a little preview. The idea is look, you've got lots of different columns. Maybe you've got a thousand columns. That could be a person's profile has lots of different demographic information, thousands of pieces of information on each individual user possibly. And maybe that's just too much information to find a very reliable relationship, a very good quality predictive relationship. And that can often be the case in machine learning, where you have in some cases, too much data to find a relationship. So it might be necessary, as part of solving the problem, to reduce this down just to a smaller number of numbers. So if we have x is there, what can we call them? We could call the Cs: C1, C2, and C3. The question here is, can I summarize a thousand individual numbers, can I summarize those by three numbers? Let me give you an example or where that would be possible without doing too much damage. Well, if most of these numbers were zero, and only occasionally were they one, we could probably delete most of them. That would be a way of compressing it. If most of these numbers were mostly zero, but you had a one every now and again, maybe you could sort of take a mean of these columns, take a mean of those columns, and take a mean of those columns. That would give you three numbers. And maybe that would be pretty good. So that would be like a third, be like a half, and this is zeros, there's those three numbers. Does that capture what was there in the original? Maybe not fully, but it gives you something. It gives you a starting point. To reduce the amount of information you have, without just being completely in the blind, in the dark. So that's called compression, or dimensionality reduction. And that's kind of probably the second-biggest part of unsupervised learning. Number one, clustering, number two, dimensionality reduction. And you know, as with everything here, we will go into detail behind these. But this just gives you a sense of what can you do in machine learning, if you don't have the thing you're trying to predict, or if it isn't a prediction problem, maybe some other kind of problem. What other kinds of problems are there? Here's one, you know, dimensionality reduction. Here's another, clustering. All right.


The Theoretical Basis of Machine Learning - Finding the Model with Linear Regression Part 1 - Finding the Model with Linear Regression Part 2 - The Semantic Gap - Approximating the Truth Part 1 - Approximating the Truth Part 2


About the Author

Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.

His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.