Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.
From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.
Learning Objective
- Understand the importance of gradient descent and backpropagation
- Be able to build your own neural network by the end of the course
Prerequisites
- It is recommended to complete the Introduction to Data and Machine Learning course before starting.
Hello, and welcome to this video on derivative calculation. In this video, we will calculate the weight corrections for the simple network we introduced, and we will also understand why the procedure is called back-propagation. Let's go back to the network we've introduced. We described the simple neural network formed by only two nodes, and we wanted to calculate the derivative of the cost with respect to the weight w two. Using the chain rule, we see that the derivative of the cost with respect to w two is the product of three terms. Let's calculate each one individually. The first term is just the derivative of the cost function with respect to the output of the network y hat.
This will depend on the exact form of the cost function, but it's well defined, and it can be calculated for a given training set. The second term is the derivative of the activation function. In order to calculate it, we need to calculate the derivative of the sigmoid. This is easy to calculate, and it turns out to be equal to a product of two sigmoid functions. The third term is the derivative of a linear function, so it corresponds to a one, the activation coming out from the previous layer.
So, we've calculated the three terms, and we can recompose them to obtain this expression. If you plug in the current values for zed one, a one, and y hat, this expression is well defined, and it will yield a number. This is the number we're going to subtract from the value of w two in order to update it and decrease the cost. Notice also that this term is proportional to the input a one into the second node. The correction to the weight w two is therefor obtained by subtracting a quantity that is proportional to the input through a factor called delta two.
Delta two is calculated using parts of the network that are downstream with respect to w two, and it corresponds to the derivative of the cost with respect to the input sum zed two. In a similar way, we can use the chain rule to calculate the correction to the weight in the first layer w one. We can see that this is also proportional to the input value x through a factor that we call delta one. The interesting fact is that delta one is also proportional to delta two.
This is why the procedure is called back-propagation, because we start from the error in the output, and we propagate weight corrections upstream, layer by layer, starting from the one closest to the output and then moving towards the input by walking through the inner layers. So, in conclusion, in this video, we've applied the chain rule to a simple network to calculate the weight updates, and we've learned that this technique is called back-propagation because it proceeds backwards, starting from the input and then moving towards the output. Thank you for watching and see you in the next video.
I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.