Machine learning is a big topic. Before you can start to use it, you need to understand what it is, and what it is and isn’t capable of. This course is part one of the module on machine learning. It starts with the basics, introducing you to AI and its history. We’ll discuss the ethics of it, and talk about examples of currently existing AI. We’ll cover data, statistics and variables, before moving onto notation and supervised learning.
Part two of this two-part series can be found here, and covers unsupervised learning, the theoretical basis for machine learning, model and linear regression, the semantic gap, and how we approximate the truth.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- So let's now bring these two definitions or two illustrations of this topic together. The idea that machine learning is a form of weak artificial intelligence, and that it's also computational statistical inference. How do we join those ideas up? Well, maybe let's do it by looking at the history of AI a little at this history. So, let me put my pencil on again. History. What do we have? Well, sort of '40s, '50s, artificial intelligence was added at the forefront of artificial intelligence was ordinary, logical programming rules. So you would say, you know, if the age of the user, let's say, were more than or equal to 18, then we perform some action. And if that condition wasn't met, we would perform some other actions. So maybe like allow and deny. Now, Does that look to you like artificial intelligence? Not today, we think of, you know, simple systems of rules which test propositions or conditions. So this is a technical word here, essentially meaning condition or test. There are other ways of defining proposition, but the key element of a proposition is here that the age is more than or equal to 18. Now a system of rules and propositions, doesn't today feel like AI because it's a little too simple. And that's what, you know, people discovered in the '40s and 50s and actually this is too simple. So what did they do? Well for a long time they just sort of gave up really in many ways , it ended as a major research project until the 1980s and then people in the '80s thought, well, okay, well we've got sort of systems of rules and propositions, you know, all that sort of stuff. What if we got experts to give us those rules? Could we then imitate humans? You know, could we get that intelligence, if we got the rule from the experts they were following? Okay, fine. So what would that be? Maybe if you know, the size of the tumor, we say tumor size here, is more than 18 centimeters when whatever a doctor medical expert would think. Well, if that's the case, then maybe we diagnose it as cancer. Otherwise maybe we say it's benign. You know, and there are systems like this, you know, I believe in around the early '90s, there were medical systems for diagnosing conditions that performed better than doctors on some cases just using this approach where you have the rules coming from the experts. Okay, so this, the first approach here, you might call it logic programming or use of just me logic. Second approach here we've got what's called Expert Rules Now, where do we go from here? Well, it was several things were discovered about this approach, this Expert Rule approach. It would seem that we don't really follow the rules, that's the problem. There's lots of problems really. I mean, that's one of them. Let me give you, let me just illustrate what I mean by that. You know, suppose that we're trying to build a machine that rides a bike, so what do we do? Well, we go to the bike rider and we say, "What rules do you follow?" The bike rider is a little confused by that I should imagine. Why? Well, because the bike rider doesn't think to himself, well, you know, it's winds coming on my face from this direction, so I moved my hand here and then I do this and then I do this and then I do this. I mean, bike riders is not at all conscious of what he's doing. I mean, if he has acquired the skill of bike riding, he pays absolutely no attention to what he's doing, he pays attention to the road. He's acquired the skill really when he stops following rules, at least rules that he's aware of. So we have this problem that the very least experts seem to have no idea what rules they're following. Which is true. I mean experts, if you know you would call an expert, let's say diagnose a tumor. Well, a doctor isn't just for, you know, 18 centimeters here . What they're doing is they're talking to the patient. They're considering their personal history. They're considering the history of the family that they know, the area, even the community that they are a part of. All this background knowledge and medical expertise and human expertise, emotional intelligence come into a diagnosis. Doctor doesn't know even what he did, you know, you record him and he has no notion of why he diagnosed one thing or another. And he, if he tries to tell you and come up with, well I did this and then I did this and then this. What you'll discover is that those rules, he comes up with, are wrong! That he didn't actually follow. He's making it up. He may believe he followed them, but he didn't. So we're in a bit of a tricky situation in trying to get rules out of experts because as soon as they've acquired the skill, they're not even really conscious of what rules they're following, if they are really following rules of any sensible kind at all. Let's go back to the bike rider. You know, the weather condition changes. He moves his hand to one different position. You know, he looks up in the sky. He's hearing things, sound of a car goes off, makes an adjustment to the speed, which is his feet and pedals. How on earth can we use a system of sort of algorithmic, simple propositional rules ? To account for such an infinite variation in the environment, life changes, car changes, person crosses the road, it's near impossible. Well let us make the best attempt that we can. And to do that, what we'll do is, we'll modify these rules to use data. And that's the innovation of machine learning. So what we say is, you know, okay you go for tumor size, we can go for something else. You know, 'cause we could say, look, if wind speed is more than 45 miles per hour say then hand position, you know, let's add to the hand position. What should we add? like half centimeter? That's, tilt head or move foot or something at a foot. Let's add to the foot three degrees of something, you know, move the foot up three degrees, you know. Otherwise, if the wind speed does something else and something else and something else. And so we can have these rules which are now not provided by experts in fact. But these numbers here, let me just highlight them for you. These numbers here and perhaps even the things that we are considering to some degree, but especially those numbers are obtained from data not from expert. So literally we know maybe we actually record lots of bike riders, somehow provide a system of interpretation of video in terms of wind speed and so on. You'll collect environmental data somehow. Feed that into a computer, have it compute using statistical algorithms, what these numbers should be to give you a good performance on a bike. And then we use those sorts of rules. Okay, so what's the history here and how do all these things fit together? Well....... Machine learning is a form of weak artificial intelligence in the sense that it is the same kind of thing that people have been doing since, well since computers were invented in the modern sense, through 30s and 40s which is just following some algorithm. You know if this then that then that then that, right? That's machine learning is a form of that. But the innovation is that today the rules are tuned and specialized by the consideration of typically large amounts of information. And the more information we consider, which is specific to the problem we're trying to solve, the better we can tune these rules to give us a good performance. Now the reason this is still weak AI and not strong AI is 'cause it doesn't seem yet that we have connected this technique to a general problem solving. So in other words that I actually need to feed the machine highly specific videos, highly specific images of the environment, highly specific information goes in and it learns highly specific values. You know, for your variables interests and you know, you make changes to the environment that it hasn't seen before in almost identical ways. And it can struggle. You know, if I send the bike on a mountain and it has only seen roads, this artificially intelligent bike, it just won't work. Whereas there's very high probability that if I teach a child how to ride a bike on a road, that it will still, the child will still perform pretty well on a mountain, may be less well, but it won't just catastrophically fail. And that's the problem with this modern approach. It is in fact still quite specific, requiring specific large amounts of specific data to solve specific problems. And you can try and maybe glue a lot of these systems together to give you an increasingly general device or solution to a problem. But in the end there, it is still very limited, more limited than a human being, in the sense that it is not acquiring a skill, it's not creatively figuring things out, but really replaying, rules still, still rules that are now just tuned by data rather than by expertise in the 80s or like programmers themselves in the 40s. Right, let's leave there. So hopefully you can see how all this is fitting together. Okay, so why is it computational statistical inference? Well, because actually it's algorithms in the sense of AI. So it's weak AI, it's algorithms, it's computer science, it's machines, AI, good. It's statistics in the sense of the rules themselves, the algorithms themselves and that kind of specialized by statistical analysis of data.
Michael began programming as a young child, and after freelancing as a teenager, he joined and ran a web start-up during university. Around studying physics and after graduating, he worked as an IT contractor: first in telecoms in 2011 on a cloud digital transformation project; then variously as an interim CTO, Technical Project Manager, Technical Architect and Developer for agile start-ups and multinationals.
His academic work on Machine Learning and Quantum Computation furthered an interest he now pursues as QA's Principal Technologist for Machine Learning. Joining QA in 2015, he authors and teaches programmes on computer science, mathematics and artificial intelligence; and co-owns the data science curriculum at QA.