- Home
- Training Library
- Big Data
- Courses
- Getting Started With Deep Learning: Working With Data: Gradient Descent

# Chain Rule

## Contents

###### Gradient Descent

## The course is part of these learning paths

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

**Learning Objective**

- Understand the importance of gradient descent and backpropagation
- Be able to build your own neural network by the end of the course

**Prerequisites**

- It is recommended to complete the Introduction to Data and Machine Learning course before starting.

Hello, and welcome to this video on the chain rule. In this video, we will calculate the derivative of a cost function and we will learn about the chain rule of derivatives. Let's work through the gradient calculation for a very simple neural network. Let's start with a network with only one input, one inner node, and one output. This will make our calculation much easier to follow. The following relations are true. z1, the input sum of the first node, is the weighted sum of the inputs plus the bias. Notice that, in this case, we only have one input, so there is really no sum. The activation, a1, is obtained by applying the sigmoid function to the input sum, z1, and this is indicated by the letter, sigma, here. A similar set of equations holds for the second node, with input sum, z2, and activation, a2, which is also equal to our output of the network, y-hat. The cost function, J, is a function of the two labels, y, and the predicted values, y-hat, which contain all the parameters of the network.

So, let us start by calculating the derivative of the cost function with respect to w2. w2 appears inside z2, which is, itself, inside the sigmoid function. So we need a way to calculate the derivative of a nested function. The technique is actually pretty easy, and it's called the chain rule. Let's see how it works. The chain rule is a rule to calculate the derivative of nested functions. Let's say we had a function like this one, h of x equal the logarithm of two plus the cosine of x. How do we calculate the derivative of this function with respect to x? This function is a composition of the function, f of g, that is the logarithm of g, and the function, g equals two plus the cosine of x. So h of x is really a function, f, of a function, g, of x We can calculate the derivative of h with respect to x by applying the chain rule.

First, we calculate the derivative of g with respect to x, and then we calculate the derivative of f with respect to g. Finally, we multiply the two. f of g is the logarithm function, so we can calculate the derivative using the derivative table we've showed before, and we find that this is equal to one over g. g of x is two plus the cosine of x, and we can also calculate the derivative to be minus the sine of x. So, finally, the derivative of our nested function, h of x, is the product of the two derivatives, which is minus the sine of x over two plus the cosine of x. Notice that we substituted g with two plus cosine. Congratulations, you've learned how to take the derivative of nested functions, and in the next video, we will use this technique to calculate the derivative of the cost function in a neural network. So in conclusion, in this video, we've introduced the chain rule which is nothing more than the product of the derivatives of the chain function. We also explained why we needed to calculate the weight updates in our back propagation. So thank you for watching, and see you in the next video.

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.