Confusion Matrix
Start course
2h 4m

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

Learning Objectives

  • Learn about the foundations and history of machine learning
  • Learn and understand the principles of memory storage, computing power, and phone/web applications

Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.


The datasets and code used throughout this course can be found in the GitHub repo here.



Hello and welcome to this video on the confusion matrix. In this video, we're going to go beyond the accuracy score and introduce better way to judge how well our model is doing. We will talk about the confusion matrix and also about precision and recall. So far, we have used accuracy as a way to judge the quality of our classification model. However, depending on the problem, this may not be the best way to assess the model performance. In fact, accuracy tells us how well we are doing overall. But it doesn't give us any insight on the kind of errors the model is doing. Let's see how we can do better. Let's say we are building a model to predict whether or not a person has cancer based on the results of a screening exam. 

This is a binary classification problem. And there are four cases possible. We could be predicting a person is healthy when the person actually is healthy. This is called a true negative. We could also be predicting that a person has cancer when the person does have cancer. This is a true positive. Or we could be wrong in one of two ways. We could be saying that there is cancer when there actually isn't. This is a false positive or Type I error. And we could be saying that there is no cancer when there actually is, which would be a false negative or Type II error. The table accounting for each of these four cases is called the confusion matrix and it gives a better view of what's being predicted correctly and what's not. Accuracy is the overall ratio of correct predictions to the total number of data points. So in the terms above, accuracy is equals to true positives plus true negatives divided by the total. Let's stop for a second and say that based on the result of our test, you are sent to a secondary screening with a specialist. 

On which side would you rather the model be wrong? Would you rather minimize false negatives or false positives? Most people would rather prefer a false positive, do an additional screening, and make sure there is no disease rather than go home feeling safe while they actually are sick. Would that be your choice, too? What if you were a health insurance now? Commissioning the model to a data scientist. Would you still choose the same way? A false positive in this case is a cost to you because the patient will go on to see a specialist. So would you not rather minimize false positives in this case and care less about false negatives? As you can see, there is no correct answer. Different stakeholders will make different choices based on what error they would rather avoid. This is to say that the data scientist is not a neutral observer of machine learning process. The choices she makes fundamentally determine the outcome of the training. Finally, let's learn a couple of other terms. We define precision as the ratio of true positives to the total number of positive tests. Precision will tend towards one when the number of false positives goes to zero. That is, we do not create many false alerts. On the other hand, recall is defined as the ratio of true positives to the total number of actually positive cases. Recall will tend towards one when the number of false negatives goes to zero. 

That is, when we do not miss many of the actually positive cases. Finally, we can combine the two in what's called the F1 score. F1 is equal to twice the product of precision times recall over the sum of precision plus recall. F1 will be close to one if both precision and recall tend to one, while it would be smaller than one if either of them is smaller than one. In this sense, the F1 score is a good way to make sure that both precision and recall are good and therefore both false positives and false negatives are kept under control. While these definitions hold true only for the binary case, we can still extend the confusion matrix to the case where are more than two classes. In this case, the element IJ of the matrix will tell us how many data points in class I have been predicted to be in class J. For example, here you can see that three points that actually were in class B were predicted to be in class C. This is very powerful to see if any of the classes are being confused. If so, you can isolate the data being misclassified and try to improve the model just on those data. In this video, we have introduced the confusion matrix as a useful way to explore the kind of errors our model is committing. We have talked about false positives and false negatives and about the choices a data scientist needs to make when deciding how to optimize a model. Finally, we have introduced precision, recall, and F1 score. So, thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.