- Home
- Training Library
- Big Data
- Courses
- Introduction to Deep Learning

# Multiple Outputs

## Contents

###### Introduction to Deep Learning

## The course is part of these learning paths

Continue the journey to data and machine learning, with this course from Cloud Academy.

In previous courses, the core principles and foundations of Data and Machine Learning have been covered and best practices explained.

This course gives an informative introduction to deep learning and introducing neural networks.

This course is made up of 12 expertly instructed lectures along with 4 exercises and their respective solutions.

**Please note**: the Pima Indians Diabetes dataset can be found at this GitHub repository or at Kaggle page mentioned throughout the course.

**Learning Objectives**

- Understand the core principles of deep learning
- Be able to execute all factors of the framework of neural nets

**Intended Audience**

- It would be advisable to complete the Intro to Data and Machine Learning course before starting.

Hello and welcome to this video on multiple outputs. In this video, you will learn to extend the fully connected architecture to deal with cases that have multiple values in output. You will also learn about an activation function called Softmax. Earlier in this course, we introduced the multi-class classification problem. Neural networks can be easily extended to cases where the output is not a binary class, but instead multiple classes. They can also be easily extended to regression problems where the output is not a single value which is very important for applications like self-driving cars, where the network has to predict direction and the speed of the car at the same time. To do regressions with multiple outputs, we group the outputs in a single vector and the extension of our framework is pretty simple. We just add as many output nodes as the components of the output vector and we're basically done. Each of these nodes will generate an independent output value that we will assign to the corresponding coordinate of the output vector. The case of classification requires a little more discussion because we need to carefully choose the activation function. In fact, when we are predicting discrete output, we could be in one of these two cases. We could be predicting mutually exclusive classes or each class could be independent from the others.

Think, for example, of the difference between folders and tags when you are organizing files. Each file can only be in one folder and if it is in folder A, it cannot be in folder B. Conversely, we can add multiple tags to a file. For example, a document can be tagged both work and document, while a photo could be tagged work and photo. Folders are mutually exclusive, tags are not. The first step to treat this problem with a neural network is to transform the output to a series of dummy binary columns. In the case of mutually exclusive classes, each row will only host one non-zero value corresponding to the class the record belongs to. So, in this example, the first row has a one in column A and zeros in all the other columns. The case of tags, we allow as many ones in each row as the tags for the corresponding document. So, the first row in this case, there is a one for personal and also a one for document. At this point, our output is a vector of zeros and one and we just need to apply the correct activation function. If we are predicting independent tags, each output needs to be normalized to the interval zero, one. But, there can be multiple output nodes with values close to one. In this case, we will use a simple sigmoid function on each node of the last layer. So, each output value will be an independent probability of having that tag.

The case of mutually exclusive classes is a bit different. Since we want to interpret the output of each node as the probability of being in the corresponding class, we need to choose an activation function that forces the sum of the output to be equal to one. In this way, if a document is likely to be in class A, it will automatically be unlikely to be in any other class. The Softmax is a function that does just that. It takes the output vector, Z, of the last layer and it transforms it into a vector of the same length whose components are the exponentials of the original output normalized by the sum of such exponentials. In this way, the sum of the components of the Softmax always adds up to one. The Softmax is always applied to the output of the last layer when we deal with mutually exclusive classes. In conclusion, in this video, we've seen how to extend vector regression to the case where there are many outputs. We've seen how to extend the classification to the case where there are many classes, mutually exclusive, using the Softmax activation function. We've also seen how to treat the case where the classes are not exclusive by simply using a sigmoid on each output node. Thank you for watching and see you in the next video.

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.