Continue the journey to data and machine learning, with this course from Cloud Academy.

In previous courses, the core principles and foundations of Data and Machine Learning have been covered and best practices explained.

This course gives an informative introduction to deep learning and introducing neural networks.

This course is made up of 12 expertly instructed lectures along with 4 exercises and their respective solutions.

**Please note**: the Pima Indians Diabetes dataset can be found at this GitHub repository or at Kaggle page mentioned throughout the course.

**Learning Objectives**

- Understand the core principles of deep learning
- Be able to execute all factors of the framework of neural nets

**Intended Audience**

- It would be advisable to complete the Intro to Data and Machine Learning course before starting.

Hello and welcome to this video on Feed Forward. In this video, we will return to the fully connected architecture and explicate the feed forward calculation more in detail. We can think of the neural network as a big mathematical function, F. This function takes an input value from the feature space and outputs a value in the target space. This calculation is called Feed Forward and as we've seen, it's a composition of the linear and non-linear steps. Let's see how it's done in more detail. The input is a matrix of size, P by N. P is the number of rows in the matrix and each row corresponds to a data point. N is the number of columns and each column corresponds to a feature. So basically, X is our usual tabular data.

The inputs are passed through the nodes by multiplication with the weights. The first feature is passed through the first node by multiplication weight W one one, and to the second node through the weight W one two. And to the second node through the weight W one two, and so and so forth for all the other nodes. So we can arrange the weights in the first layer in an N by M matrix, W, where N is the numbers of rows and corresponds to the number of input features, while M is the number of columns which is the number of nodes in the first layer. Similarly we can arrange the biases in a vector of size M, where M is the number of nodes in the first layer. So the first layer performs the linear transformation defined by W one, and B one. Followed by the non-linear transformation due to the non-linear activation function. In this case, we chose the sigmoid.

That one is the output of the sigmoid, and it's a matrix. The size of that one, if you've been following the calculations so far, is a P by M matrix, with as many rows as the points in input, and as many columns as the nodes in the first layer. So, in a way we're back to square one. We have an input matrix again, this time it's called Z one. With as many features as the number of nodes in the first layer. And as many data points as the original data points. Each of the features is a linear combination of the input features. The form by the non-linear action of the sigmoid. So we can start again our calculation by multiplying Z one by the weight matrix W two.

W two is an M by K matrix, where M is the number of input features, which is also the number of nodes in the previous layer. And K is the number of nodes in the current layer. Then we add the biases and pass the output through the activation function, and we are ready to start again. We can repeat this process at each new layer. Each layer will take the input of the previous layer, make a linear combination of them, and deform them with the non-linear activation function, and pass the new values as input to the next layer. At each step, we only do a matrix multiplication, and a non-linear transformation. And that's it. This is why we can think of a neural network as a single function, because it is, in fact, a single function. It's a function composed of many nested functions. One last comment for the last layer. The last layer will have as many nodes as the values we are trying to predict. So, if we are using the neural network for a regression problem, we will have a single node in the last layer.

This node will take all the outputs from the previous layer, and combine them to give the final answer. If the problem is a binary classification, we will just apply one last sigmoid function at the end, to constrain the output of the final node between zero and one, and we're done. So, to conclude, in this video we've explained the Feed Forward calculation in greater detail, and highlighted the fact that the neural network is a function. Thank you for watching, and see you in the next video.

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.