- Home
- Training Library
- Module 7 - Probability and statistics

# Module 7 - Probability and statistics

## Contents

###### Practical Machine Learning

## The course is part of these learning paths

This course explores the topic of probability and statistics, including various mathematical approaches and some different interpretations of probability. The course starts off with an introduction to probability, before moving on to cover the topics of Bayesian probability, Frequentist probability, statistics, probability distribution and normal distribution.

- In this session, I would like to discuss probability. In particular, developing some intuitions around what it is. Some different mathematical approaches, different interpretations of probability, that can be useful to us. And maybe to finish off, I'll look at how all that fits into the machine learning space. But I think we need to take probability first as its own subject, and then involve it a little bit with machine learning. So let's start with intuition. What do we mean by probability? Here's the word probability, and then what really are we talking about? So yeah, in English, we typically say things are likely or unlikely, or probable or improbable, or possible or impossible, right? So these are the sort of words we use. Likely, we might say unlikely, probable, improbable. And what sort of things do we describe that way? Well, you know I might say that my winning a game is likely, I might say that getting an odd number on a dye is likely, I might say getting heads is probable or equally probable to getting tails, so, these are all events. They're all events that we are talking about, the outcomes of some process. Other things we describe as probability are ideas, hypotheses, other kinds of possibilities. Maybe we say that, you know, I think it's likely that, well, I don't know the theory of evolution is true or something. I think it is likely that my grandmother's thoughts about her cooking are correct or incorrect. There's another sense to the way we use probabilistic terms, not to describe specific outcomes like a heart or a three, but to describe ideas, beliefs. So with probability, we tend to look at two kinds, we tend to apply probability to two kinds of things: beliefs, and events. And that makes understanding probability a little bit difficult. What we're going to do is going to try and formalize some of the things we say, perhaps starting with events. And then see if we can generalize that idea, and see how far it takes us in our ability to map a mathematical account onto the way we use the terms in English, the way we apply the concepts every day. So let's start with looking at some events. Now, what sort of working intuitive thing, head, tail, head, tail coin. For this example, coin has two outcomes. Has a head outcome and a tail outcome. I'll just notation it. What we'll do, is we'll say P of the head outcome, is the probability, the likelihood. We're going to ascribe to that outcome. Now perfectly reasonable here, just to say that the P of a head probability of a head is equally likely or equal to P of a tail. That hasn't fully mathematical, yeah, we haven't got a fully numerical account here. We've introduced a mutation. So P of H is the likelihood we have ascribed to an event outcome. And we are saying that the probability of getting a head is the same as the one of getting a tail. Let's evolve this a bit further. Let's, rather than use these English terms, let's develop a scale that gives us a way of irrelatively assessing different outcomes. So that scale is going to have to be, you know, fixed, beginning and end, starting point. And since an event like a coin flip only has a fixed number of outcomes, we're going to have to sort of partition our confidence if you like, how likely things are, into these different outcomes. So let's just see what I mean by that. So we're going to do is you have a scale also known here as a measurement or a measure, and the measure here is gonna go between zero and one. I don't know really how, I mean, there's some kind of fundamental utility to using zero to one as our scale, because we can think of one as like the whole pie. One is our full confidence or the full likelihood. And we can think of some fraction of that then as being how much likelihood we're attributing to the different outcomes. So for example, we would say probability of a head to be 0.5, or one over two. Likewise, probability of a tail, 0.5, one over two. What we have done is we've taken our scale between zero and one, and we have said, well, if this is a hundred percent likely guaranteed, well I expect half the time I could get heads, and the other half of the time I could get tails. We're evolving some notation here, and we're trying to correspond this notation to our intuitions. We're saying okay, rather than just using English language terms, like likely and so on, we're going to use a scale. Remember we are using a scale, we're considering events, and events have a fixed number of outcomes. What we're going to have to do is award or allocate to all of these different outcomes, a certain percentage of that scale, a certain amount of it. So that all of them add up to one. What we will say is, we'll say that we are basically asserting it by the way, basically just asserting, well, you know, I'm gonna call the full amount one. I'm going to call these allocations a half and a half, and we're gonna rig it, require it essentially, to add up to one. And the idea then is if we make those assumptions or assertions or make those, have those as actions of our system, that the notation should track our intuitions about how we're using the concepts. So in some sense, the concept of probability, are not mathematical accounts. What the mathematics is trying to do is give us a numerical account of what those concepts are. Right. So that's the beginning point. Now, I think before we move on to develop this a bit more, we've got some of these assumptions at work now, are actually into our system at work, let's evolve the full setup. So we've got notation for everything. So what do we need, what more notation do we need? Well we need something to describe the outcomes, and something to describe each possible event we can get. So let's say that. So what we're going to go for here is going to say that the outcomes are capital Omega, or just Omega, which is this symbol here, Greek symbol. Instead of outcomes then, it's going to be for this particular, for coin flips anyway, it's going to be heads or heads along with tails. And that's the out what we call here the outcome space. Outcome space or outcome set, but for most purposes, space is included to set. Now, what we need now of course then is some notation for an event. Now what I've been putting here, just the head and the tail directly in the parentheses and saying P of that set. But maybe we go for an A say. And A is just going to be some element of our outcome space. So here, it's going to be for example, head, right? And then we would say probability of A, and we would award that or allocate that the ratio, the amount of the fraction of our total probability, one half. Let's go for another example to show you how this notation plays through. So, let's go for a dice. What if I say, what's the outcome space of a dice? The outcome space of a dice is one, two, three, four, five, six. Those are all the faces. Now let's do something a little different for our event. Let's say that we are interested in the number, an odd face. Let's say we're interested in an odd face. What does that mean? Well, now we're not interested in one event, we're interested in, what several possible events, one, three, and five. Those are the odd faces. Now, how should we allocate? How should we allocate probability to our event A? We could do it somewhat intuitively. That is to say, we could make just a kind of stab in the dark, in the coin case. There's a sense in which, look there's two outcomes, they seem equally probable. So we'll say one half and one half. How do we do it in a dice case? Well, what we have to do, is the same kind of reasoning process, but a bit of addition. So the relevant, equally probable outcomes in this case are one, two, three, four, five, six. And what we going to do is divide, therefore, our total probability into six, because all of these outcomes are equally probable. So each of these outcomes gets a one sixth of our total probability. So one, which is out total, divided by the number of outcomes that are available, one sixth. Now, to the probability of our event, A, that is just an addition. It's one six for outcome one and so on and so on. So it's three sixth or one half. So what we've done there is we've used a formula for arriving at probability that comes from what's known as Laplacian or Classical probability, Classical Probability theory. The formula is this, and we'll come back to what we mean by classical in just a second. The formula is this. What we've done is we've said the probability of event A, is the number of elements in A, which is a set of a potential outcomes, divided by the number of elements in all possible outcomes. Let's just show you that. What we've done is we've basically said, well, there's three outcomes in A, three, and we've divided by the total number of outcomes in Omega, the outcome space which is six, and that gives us our probability, which is three over six, or one half. Side point here, we're just using this hash symbol to mean number of elements. Number in A divided by number in Omega. Now this formula works because the elements of A, are all equally probable, and the elements in Omega or equally probable. So it is a simple ratio. And this approach to determine probability as I've said here is known as Classical or Laplacian approach. Classical approach. And you can understand that approach has been defined by this formula. That if you're using that formula to estimate a probability, well, it's not an estimate in the case of a die or a coin, if you're using that formula to arrive at your probability, you're taking a classical approach. Now, you might think, well, what other approaches are there? And why wouldn't I want to take this approach versus some others? There's a couple of others, Frequentist and Bayesian we'll move onto those. Before we do, I want to point out the limitations to this approach. Now, as we're looking at these events here, there's coins and there's dice. We can see that the outcome space, you know, all the outcomes are equally probable, but for many events, they won't. Like for example, if I had a weighted coin, I couldn't just say my outcomes were head and tail. My event were head and therefore, my probability is one head, divided by two possible outcomes, one half. It wouldn't be true. So if I have a bias, or if my outcomes are not equally probable, then I can't use this formula. It might be possible to Jerry rig it somehow, but basically, it's not going to be very easy to use. What else is it? What's another limitation? So if my outcomes aren't equally probable, I don't know what to do. Another approach, another problem here is maybe it is not even possible to define what the outcomes are. Let me give you an example of that. Suppose I'm talking about and here the common example is an election, suppose I'm talking about an election. And you might think, well, aren't there just two outcomes, uh, candidate A or candidate B winning? Well, no, because there are lots of ways candidate A can win in terms of number of votes, and there's lots of terms of ways that candidate B can win in terms of number of votes. I mean, and can you define all the different worlds? Say all different possible configurations that lead to B winning, and all the different possible configurations that lead to A winning. And so all the configurations you're talking about are equally probable? No basically, you can't do that. So for many kinds of events that occur, in fact, there may be an infinite number of outcomes. And even if they're not, it may be basically impossible to define the outcomes in such a way that they're all equally probable. That they all have the same level of probability of occurring, and the same level of details in them. They're very difficult to define. Classical approach is limited by even if you can define the outcomes, you can't use the approach when they're not equally probable, and sometimes you can't define outcomes. So what should we say there? We'll say, um, can't define outcomes and we just say complex as the limitation. So we're going to move on to talk about frequentism and Bayesianism as other approaches to probability, and one can begin to understand these as responses to these limitations.

QA is the UK's biggest training provider of virtual and online classes in technology, project management and leadership.