Practical Machine Learning
The course is part of this learning path
This course explores the topic of probability and statistics, including various mathematical approaches and some different interpretations of probability. The course starts off with an introduction to probability, before moving on to cover the topics of Bayesian probability, Frequentist probability, statistics, probability distribution and normal distribution.
- Finally, in this sort of mini section on probability, I'd like to talk about Bayesian interpretations of probability. Bayes-ian. And it gets its name from a famous mathematician called Bayes, who discovered a rule called Bayes rule. We'll look at that later on. So, Bayesian Probability. What is this? So, what I'm gonna do is leave some of that technical detail to an advanced course and just give you some of the ideas and some of the key takeaways from the approach, I think. So, this is something very, kind of remarkable, in a way. All you'll do in a Bayesian system of probability is when you are assigning a probability to an event, all you do is you, personally, judge how likely it is, out of 10, or out of one, I should say. Zero to one. "I think that there's a 60% chance of getting ahead." "I think there's a 60% chance of an election," or something. And in a Bayesian system of interpretation, we think of a probabilistic statements as not about events, not about experiments, not about any of that, but about beliefs. And so when I say that the probability of an event, my event now is not understood as an outcome, it's understood as a belief. So here E, in the case of a coin we would just translate it as the E would be the belief that I will get a head. And in a more complicated case, like in an election or something, we could say E is, or a weather, you know, that it will rain tomorrow. And what probability is doing is it's measuring our confidence, the word credence here is its technical term, but credence or confidence in our belief. So, I have a belief that it will rain tomorrow, I have a certain credence, confidence, associated with that belief. Measure it out of one. That's the probability of that belief. That's the probability I ascribe to that particular situation. Now, there are some issues with that, you might think. It seems like, you know, that can't be everything, because what if I am irrational, what if I mis-estimate or misuse information? How do I include information in that? That's a complicated question. How do I understand what that belief, confidence, really means? How does that fit into a probabilistic framework? Well, technically, what I've done is I've given you the Bayesian interpretation of what they call a Prior. And a Prior is, essentially, something like an uninformed probability. So, in Bayesian statistics, we always interpret probabilities as being relative to a kind of information that you have. So, when I write P of E, I can see that probability of E, I see that as conditioned on, or relative to, some background information I have. So let's just call that Background, my background beliefs, ideas, prejudices, thoughts. And when I read a bare probability, like P of E, I interpret that as actually having a hidden relative condition that it is in fact conditioned on my background information. And so, when a Bayesian interprets the probability of getting a head or a tail, it comes down to whether you have learnt the fact that the coin is a fair coin, or whether you have learnt something about the coin, that it only has two outcomes and these are equally probably. Once you have learnt that, then of course the probability should be one-half, and the Bayesian would say the reason the probability will be one-half is not because of the geometry of the coin, like that's what the Frequentist would say, and it's not because, well, the Classicalist would say it's not because of some ideal number of outcomes, heads-tails, it's because you have learnt certain things about the coin which gives you the most reasonable belief you can have about it, which is to say that half of the probability, half of your confidence, should be ascribed to the head outcome, and half of the confidence should be ascribed to the tail outcome. So the question for the Bayesian is, what is your probability relative to or conditioned on? What information do you know when you come up with your probability? Now, if you don't know anything, or you know just background things, we call it a Prior, and then the most informed probability we call a Posterior. So that's the probability of, let's say, some event, giving some evidence or piece of data, D. And that's versus the Prior, which is, let's say, this raw probability, probability in relation to just the event, and that's meant to be, actually, in some hidden way, relative to our background information, how much are we assigning it, in general. And the goal of Bayesian estimation probabilities, because getting a probability mode for a Bayesian thing, is starting with your Prior probability, your little prejudice, where you begin with, and then, what you call cranking the handle, churning it around, and including new facts one after another. So, okay, what's the probability it will rain tomorrow? Let's say I think, on average, say it's 50-50. But then I include, or I see rain in the window today. Then I include all the wind-speed. Then I include the clouds. And it's a little bit like Frequentism, in a way, right? Because in Frequentism you run the same experiment over and over again. Well, Bayesians say you can't do that. What you can do is just start from somewhere, start wherever you like, 50-50, and then you can include new pieces of evidence. And since each of these pieces of evidence are relevant to the belief that you have, you just keep including them, including them, including them. And since every probability they would claim is relative to some evidence anyway, this is really what everyone is doing. What the Frequentist is really doing is actually running the experiment many times, and all they're doing is including the evidence from the next experiment. What the Classical view of probability is doing is including, at the very outset, they're ruling out lots of things at the outset. If there's only two outcomes you can't have a coin on it's side, you can't do da, da, da. And they're setting it up, they're jerry-rigging it. They're asserting that the coin perfectly corresponds to a set of two things and blah, blah, blah, right? Head-tail. The Bayesian would say, "You don't know about the coin. That's an assertion." Coin could be biased, could, one day it could land on its edge. All things could happen. So what you need to do is start with just some general sense of where you think things are, and then evolve your evidence. And the need of a Prior in Bayesian statistics, or Bayesian inference, is you need to place a start for that process. You've got to use, a Prior as your place you start, and you add more and more evidence. Now, if you keep adding evidence and keep doing it in a rational way, that with enough evidence the probability you arrive at, Bayesians wouldn't say it's a true probability, but it is the probability you need. It's the best one you're gonna get. So, in Bayesian inference there's this process of starting with a Prior and going to a Posterior. So, you'll formally them by including in that Prior more and more evidence. And I think, since Bayesian statistics is a whole field of itself, you need to leave that to its own course. What I wanna do here is just give you a framework, or a sense, of how the area of probability is divided in terms of its practitioners' interpretations and approaches. There's the Classical approach, where you can set up the world clearly at the beginning. And if you did so precisely, you arrange all of your outcomes so that they are the same in terms of being equally likely, then you can just very quickly go to a nice ratio. Count the number of things you're interested in, divide it by the total number of things that are possible. If that fails, you might have a Frequentist toolbox to hand. You can do this experimental stuff. Record things, get distributions. Average, do a ratio. What is the ratio between the outcomes that you were interested in and the outcomes of the total number of experiments you've got? Do that over and over again. That gives you a solution, how probable things are. If you can't do that, or if you don't want to, if you can't run these experiments, if there isn't a stable outcome anyway, you can take a Bayesian approach. Start from somewhere, start with a probability, 50-50, what does it matter? What you will do is you will include in that lots of evidence as you go. And by consistently and reasonably including all your evidence as you go, with enough evidence, you get to a reasonable point and you can say, "Well, I think the probability of the raining tomorrow is 78%." And I would guess that in meteorology and weather channels and so on, that's probably exactly the approach they take. And that, in other words, the probabilistic models of weather are probably so sensitive to different subtle pieces of evidence that, actually, you probably need to take a Bayesian approach. What you would do is you go, "My Prior for how likely it is to rain on any day is," let's start with a frequency. Let's start with how often does it rain on every day? Maybe in the year in the UK, maybe 150 days out of 355 it rains. Let's start there, that's my Prior. Now what I do in my model, my weather model, is I include in that, I grow that, or shrink it, by how observing rain today affects rain tomorrow. I grow it by how the wind speed today anticipates wind speed tomorrow. And I include all of these subtle pieces of evidence into the model, and eventually my Prior probability, which I started with just a bare guess, based on whatever information I want, that gets closer and closer to actually a reasonable answer. So, in the game of trying to come up with the probabilities for some event, the Bayesian says, "My approach will always work, and in fact it's often the best." And there's some arguments to say that. There are arguments to say that if you can do the Classical approach, if you can define all the outcomes, then doing that is obviously the simpler thing to do, you just take a ratio. So, maybe in some situations you've got one toolbox, in another situation you've got another toolbox. Who knows which interpretation is right? Is probability about measuring confidence and belief? Is it about assessing the real geometry and causal effects, causal patterns in the world through experiment? Or is it about some kind of mathematical ratio of possible things to interesting things? In today's world Bayesianism is increasingly more fashionable. And in today's world many people believe that probability is mostly about the subjective thing, about how confident you are in your beliefs. But, personally, I think there's a way of taking each to be fundamental. But with that, let's leave it there, and let's look at some applied uses of these ideas in
QA is the UK's biggest training provider of virtual and online classes in technology, project management and leadership.