This course covers the concept of unsupervised learning within the context of machine learning. You'll learn the fundamentals of unsupervised learning and how it differs from supervised learning. We'll cover the topics of clustering, k-means clustering (and its limitations), and dimensionality reduction.


- So we've just looked at unsupervised learning, which I've said is a collection of random tools. Perhaps the most famous of those random collection of tools, is known as clustering. The idea behind clustering is, as I said, a technique or set of techniques, for finding patterns and features, that tell us how observations are grouped or clustered. So let's look at that in more detail. So the term here is clustering. And let's say a little bit about the goal of clustering. This is finding groups, or clusters, in observations in our feature set X. All right, so no connection to the target. So let's look at some example datasets maybe, and say why we would want to find patterns, what use they can have, in the learning process, and show you what the results of such a approach would be, what it mean to have clustered data. So let's come up with an example. So let's have a look at,... well here, remember we're gonna have a couple of features, in our visualization. So when we're looking at clustering; no Y, no target, so we're gonna have some axes, which are the features. Let's just choose a couple of features. So maybe we look at a relationship between age and film rating, or age and rating of a rider theme park, which is what I mentioned earlier, but maybe we do some kind of general rating. Anyways, you want to go with age for X-one, I'm gonna have a rating for X-two. All right, X-one is gonna be an age, X-two is gonna be a rating. Now, let's put on some points. So let's say, suppose that the young people, are low in X one, they rate quite highly. Let's say the older people, they rate quite lowly, and then we've got a few little interesting points everywhere else. Now this is the visual we were seeing before, in the classification case. In the classification case, what we've seen is, we would look at some features that have been visualized, and we would look at the coloring of those features, and try and distinguish between the colors, if you like, trying to think between the target variables, which are usually represented with colors. Now, here we have no colors on the visual, because we do not have a white color. So we do not know, which groups these points belong to, at the beginning of the process. In fact the goal here is to determine if there are any groups these points belong to. And here we can see visually immediately that they probably are. So let's have a look. Here's a group, here's a group, and what should we say? There's a group, here's a group, right? So maybe there are four groups: one, two, three, four. Now, you might notice from the tone of my voice there, that there was a degree of arbitrariness in terms of the number of groups we choose. In fact, technically speaking, there is no true number of groups, so we can choose for this cluster of points, for this range of points. I mean, you could imagine, what is the maximum number of groups that could be? Well, the maximum number of groups, is that every individual point here, which is a person, of course, is considered to be their own group. And the minimum number of groups is one group, which includes everyone. And at some point, as we turn the dial, and we say, "Well, perhaps there are more and more and more groups," all we are really doing, is making hopefully non arbitrary, for in some sense an arbitrary, choice about who goes in what group. So here we've got four. What could these four groups represent? So maybe top left, we have young children, who enjoy film rides, let's call this group, let's say we're thinking about it is a marketing problem. and we're trying to look at who comes to our fairground, produce a marketing campaign, what could we give like a sales team for the children? We could say these are, if we put an age on here around 10 years old, say teenagers, so these are sort of early teens, who enjoy fearlessly, so let's call these fearless teams. Why not, if our fairground is particularly, energetic? Now, when do we go to up, down the bottom here, let's call these the tired adults, is that too normative, a little too judgmental? Let's call these parents anyway. And then we've got some other groups in here. these are sort of general audience,` and who's knows what this is? So this group here, this is maybe young adults, who don't like rides. So this is like bored young adults. So, what have we accomplished here? What have we accomplished? Well, I began with the dataset, which was unlabeled. They used to say it was just random, not random, mostly, but just these spray of black points, that appeared to us not to have any organization to them. And then I said, what happens if I cluster this? That is to say, what happens if I draw these purple circles, such that they somehow, represent an underlying, real, genuine, pattern of observations, that are occurring, in different ranges of ages and ratings. So we have a group of high age, low rating. We have a group of low rating, high age, for example. So with those purple circles established, what I was able to do as a practitioner now, is look at the result of this classroom, and then begin adding in, domain specific labels, which can be used, by other practitioners. So for example, I look at, this diagram right here and said that's the parents. And the machine doesn't know that these are the parents. All the machines are gonna tell you, I see, as an interesting double circle. which says, is an interesting range of ages, range of ratings, in which we find, people coming to our fairground or whatever. What I the practitioner said is, ah, I know what those are. Those are the parents. Now, from that point on, I can use that label parents in, for example, any predictive project. So I could use the label, parents, on those points, need to do a classification problem, which would then be a question of funding model, which distinguished, for example, parents from others. So we could use these labels, as an input into a classification process. So I would begin, with just my columns: X one, X two, no Y column. And then through this process of clustering, determine that, for example, this observation over here, belongs to group, Let's call this group one, belong to group one. And then through my domain expertise, I could add in, the group one, was in fact parents, and that would give us a Y. Start something start predicting. what group are you in? So this gives you a feel, I think, for how this is gonna play out, in a learning process. It can play out in an exploratory phase. Exploratory phase, before we do any prediction, before doing any modeling, what we're gonna do is you're going to look at the data, ask the machine to cluster it, give us some circles, that give us boundaries of observations, how observations are clustered together. That could tell us something about how, what our data looks like, and then possibly, we could use those boundaries, those clusters, as a label input into a classification problem, where we have numbers against groups, and those numbers then become prediction targets, for future prediction Problem. Can we predict whether someone is a bored young adult, or fearless teen, or a parent, or something like that? Now I wanna say a few things, just to finish off on this motivation for clustering. One thing I wanna say here is that, of course we wouldn't use a machine, in the case where we could just visualize in two dimensions; one axis, another axis, a few points, we could use a machine if we wanted to, but actually I can see using my eyes, where the clusters are. In fact, I've just done it. So it's important emphasize here, I think, that we are often using the machine to arrive at these clusters, these circles, when we have very large number dimensions: four or five, six, 10, a hundred. And when we want specific boundaries, to be established, within the number of dimensions that we have. So for example, here we have two, and I wanna know, at precisely what age, does this parent group, get cut off into a young adult group? So maybe here, you would say maybe 32 and 26, say for example. So the value of an algorithmic approach clustering, is that it's going to scale with a very large room dimensions. And it's going to give you these specific boundary numbers, or somehow some way numerically categorizing, the grouping, so that can be used in few days learning properly

About the Author
Learning Paths

A world-leading tech and digital skills organization, we help many of the world’s leading companies to build their tech and digital capabilities via our range of world-class training courses, reskilling bootcamps, work-based learning programs, and apprenticeships. We also create bespoke solutions, blending elements to meet specific client needs.