Practical Machine Learning
The course is part of this learning path
This course explores the topic of probability and statistics, including various mathematical approaches and some different interpretations of probability. The course starts off with an introduction to probability, before moving on to cover the topics of Bayesian probability, Frequentist probability, statistics, probability distribution and normal distribution.
- In this video, I would like to talk about statistics. Before we get there, I think we should just look at how we go from probability about, let's say, outcomes, events, and how we get to statistics, what the difference is, at least in my opinion. Let's just talk a little bit about probabilistic outcomes. Probability and probabilistic outcomes. Maybe we should get to the takeaway straight away. The takeaway is that, in probability, we are looking at basically single events, simple events, and we know how likely those individual events are going to be. In statistics, we're often looking at aggregates of events, and we're looking at their effects. Let's say something about that. So let's say we're talking about how probable it is that a particular person will have a certain gene, and that gene will contribute to their height. Let's say, so the probability of having gene A. Let's say that probability is 20%. And likewise, there's a probability for gene B and probability for gene Z, which maybe is a very rare gene but perhaps has a very significant effect on height. In statistics, we're not looking at these individual genes. We're looking at their effects, looking at the height, looking at height. What is height? Height is really an aggregate of the effects of these genes. So what is it? It is something like how much gene A adds to your height. Let's say it adds 10 centimeters. We'll do 10 centimeters, and we'll times not point two for the average effect. That is to say that, okay, if you have the gene, you get 10 centimeters, but everyone who doesn't get it doesn't have the 10 centimeters. So on average, 20% of 10 centimeters is what we get on average. And you can see that, likewise, you're gonna have the same effect. So gene B, let's say that gives you 20 centimeters, times not point three. And let's say gene Z here gives you, I don't know, something interesting. Maybe it subtracts. Maybe, if you have gene B, you lose 10 centimeters from your height. And so the height is all of these effects added together. And there are gonna be lots of them, lots of them. What we see when we record height, though, is we do not see, let's say, a discreet jump, 10 centimeters to 20 centimeters. We see a distribution of heights. Let's draw that. So on the horizontal here, we'll have height in centimeters. On the vertical, we will have a count of a number of people with a particular height. So here we have, let's say, the average is 160 centimeters. That will be the most populous height, the one with the highest bar or whatever. You could do it discretely if you wanted to. And then let's say 140 is about as common as 180. And let's say 200 is quite rare, and 120 is quite rare. So let's have those be as rare as each other, these about as common as each other. And we're gonna draw here a distribution. We can draw slightly better perhaps. Is that good enough? Probably good enough. Let me just draw it by freehand again. Maybe we say something like that. So here is the most common because it has the highest count. You either want to use the word count or frequency, technical term there frequency. And then either side, we have this sort of gap of 20 centimeters. And things which are 20 centimeters away are as common as each other, have the same height. And then things which are 40 centimeters away, they have the same height. And if the count here is, I don't know, a billion people or millions of people, maybe we could normalize the count, to go between zero and one, and we could say that... Or we can do something. So we could do something with it. We can just say, look, this is a big number. You can go for a million. And down here is a small number. Maybe you go for a thousand, something to that effect. So what we're doing here when we observe, in statistics, observe a variable called height is we're observing the effect of lots of random events. And the technical name for height here is a random variable. It's a variable whose value is essentially random because random doesn't mean having no pattern. It means that there are variations within the value that are not deterministic. There's things that will kick the value one way and kick it another way. So height is a random variable, which is a product or random events. So the value this variable takes is, in English, we could just say the product, or resulting from lots of random events. That is to say, events that have a probability associated with them. Now the distribution I've drawn here concerns itself with height, concerns itself with a measure, a real value, which results from all these events. And this one has a special name. This particular distribution is called the normal distribution. And it arises from any case where the random variable you are measuring, height, results from a very large number of independent random events. The normal distribution always arises when you have a large enough number of random events. And why is that? Why is that? Well let's see if I can draw it on here. The idea is that there will be an average height, let's say 160, that everyone arrives at. And why will they arrive at that height? Well because there'll be some events which kick you up in terms of your height, and there'll be some events which kick you down. And for the most common height, what we are seeing is sufficient number of people have been kicked up on average to this degree. So let's make this intuitive, more intuitive. What we could see is, okay, gene A gives you 10 centimeters. Gene B gives you 10 centimeters. Gene C gives you 10 and so on. Imagine that. Now 160 happens to be the number that people arrive at when they have any combination of the most common genes. So if they have gene A and gene C, they would get 20 centimeters. If they had gene B and gene D, they get 20 centimeters. So the most common height results from the most common combination of genes. And then you can see how the reasoning continues, I think. The next most common height results from the next most common combination of genes and outwards so that it is equally probable to get 180 centimeters in height, which is about 5'11-ish, as it is to get 140. Why? Because as you get further and further away from the most common combination of genes, you're getting less common combination of genes. You're getting rarer genes appear, all the way out to two meters, nearly seven foot or something, all the way out to two meters, when presumably you have a very rare combination of genes to get really that high. And likewise at 120, you have a very rare combination of genes. And when you have enough random events stacked together, it just turns out that there's sufficient number of combinations, sufficient number of ways of aggregating these events so that the things that push you up balance or are as rare as the things that push you down. And that's the normal distribution. What I've described there, that how that normal distribution comes to be, it's called the Central Limit Theorem. Central Limit Theorem. It says, roughly speaking, if you have lots of random events that are being aggregated together, lots of random events, then you get a normal distribution. So let's now analyze the shape of a probability distribution, maybe starting with normal. And we can look at what characteristics we can see about the dataset we're looking at.
QA is the UK's biggest training provider of virtual and online classes in technology, project management and leadership.