Multiclass Classification
Start course
1h 13m

Continue the journey to data and machine learning, with this course from Cloud Academy.

In previous courses, the core principles and foundations of Data and Machine Learning have been covered and best practices explained. 

This course gives an informative introduction to deep learning and introducing neural networks.

This course is made up of 12 expertly instructed lectures along with 4 exercises and their respective solutions.

Please note: the Pima Indians Diabetes dataset can be found at this GitHub repository or at Kaggle page mentioned throughout the course.

Learning Objectives

  • Understand the core principles of deep learning
  • Be able to execute all factors of the framework of neural nets

Intended Audience





Hey guys, welcome back. In this class we are talking about Multiclass Classification. I thought it'd be cool to show you how we actually deal with a data set where we have more than two possible classes in our target. So off we go, we load a new data set, the iris flower data set and this is what the data set looks like. So, we use seaborn pair plug to show it, so we have three species of flowers and for each of them we have four features. Sepal length, sepal width, petal length, and petal width. And as you can see the three flowers form three groups where one group is perfectly well separated from the other two, whereas the other two kind of overlap in pretty much any dimension. So the model we're gonna build is gonna be taking the four features as input, so it's gonna have four inputs and it will have three outputs right. It's going to have to predict whether the flower is of one of these three species. Okay, so let's check the data set. We have four numerical columns, those are the input features. And we have the species column that is our target. 

First of all, let's get the features. We drop the species column from the data set so we got x, these are the four features and then we get target names which are the three targets here. Then what we do is we build a dictionary where we enumerate the target names and we build a dictionary that for each name, has a value off the index. So target dictionary is gonna be setosa in zero, versicolor one and virginica is two. So it's also alphabetically order. Then what we do is we take the species column and map it to the target date. Okay, so now, our column it's gonna be zero, one, two. And then, what we wanna do is create dummy columns and so we, instead of using PD getdamage, we totally could use that, I just wanna show you a different way of doing the same thing. So we have a utility function in keras that is called to categorical that does exactly the same thing. 

So if we pass the column y with the numbers we obtain the categorical vector, we have ones for you know the species that are setosa and zeros otherwise and then you have some ones in the others. Okay, train test split of the x.values, our features, and our y cat, the categories output. Size, test size 20% and we assign this to the usual x train, x test, y train, y test. Okay, then we build a shallow model, just one layer, so it's kind of a logistic regression. The only difference is now our output has three output notes. We have these three possible values of which only one is active, only one is one and the other two are zero, by construction okay. So in this case when we have three or more mutually exclusive classes, like in this case, we're not going to be using a segmoud, we're gonna be using a function called softmax. As we've seen in class, that essentially constrains the overall probability to one and so we'll basically shrink these guys to zero if this guy is high. We also need to change a different, choose a different lose, so the lose we are gonna be using is the categorical crossentropy, which always goes in tandem with the softmax. So perfect. So we build our model with our two changes, softmax activation in the output layer, three notes in output and categorical crossentropy for the cross function. Then we train our model, notice that I've introduced a new argument here in the fit in function and its the validation split. 

So what this argument does, is it takes 10% of the training data, lifts it out from the actual training and checks the lose and the accuracy on the you know, 10% that he's left out. This has nothing to do with the test set that you know we never touch during this training. It's basically another internal test set, internal to the training set that we use for checking how you know, our model is generalizing. Whether the you know generalization of BBT's improving. And we see that you know, in the beginning it's kind of fluctuating, but then you know, we get to higher values and at some point we should converge to you know, close to one. Okay we've trained for 20 Epoch's. If we're not happy with it, we can train for another 20 Epoch's and we pretty much got to an accuracy of one on the internals you know, validation test, the validation set. So now we check you know the predictions on the test set and as you can see the predictions have three numbers, because we have three nodes in output and so you know we are predicting the probability for each of the three classes. So for example, this first element of the test set has .1, so 10% probability of being the third class, 88% probability of being in the second class, and less than 1% probability to be in the first class, sorry 88 to be in the second class and less than 1% to be in third class. So this is most likely predicted to be in second class. 

The way we go from this kind of probability predictions to actual classes predictions is to take the index of the maximum value along the rows. For each row we take the index 012, that is the column index of the class with the highest probability. So we do this for both the test and the predicted. Notice that this gives us back the you know, essentially the same values that were in y, but for the test set okay. So, we do this and then compare, we do a classification report on our test class and predicted class.

 So what we see is, well first of all we have very few data points, okay so these scores are not to be trusted very much, but we see that class zero, that by the way is the class that is completely separate from the others, it's the blue guys here, the setosa. So it makes sense that the model learned to completely distinguish this class, because we are actually completely separate. So the model is doing really well on the blue guys, whereas on the red green boundary it's probably doing some confusion, which is expected, it's nothing surprising, in fact if we check the confusion matrix, we'll see that there is basically one point per class that gets mixed classified. So we built a shallow model with three outputs and a self max outcome beyer. This is pretty remarkable, that the model is so good already and going on will be even deeper models. So thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.