1. Home
2. Training Library
3. Big Data
4. Courses
5. Getting Started With Deep Learning: Working With Data: Gradient Descent

# Derivative Calculation

Developed with

1
Introduction
PREVIEW1m 22s
3
4
12
13
EWMA
4m 12s
14
17
18
20
22
24
26

## The course is part of this learning path

Start course
Overview
DifficultyBeginner
Duration1h 45m
Students233
Ratings
5/5

### Description

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

• Understand the importance of gradient descent and backpropagation
• Be able to build your own neural network by the end of the course

Prerequisites

### Transcript

Hello, and welcome to this video on derivative calculation. In this video, we will calculate the weight corrections for the simple network we introduced, and we will also understand why the procedure is called back-propagation. Let's go back to the network we've introduced. We described the simple neural network formed by only two nodes, and we wanted to calculate the derivative of the cost with respect to the weight w two. Using the chain rule, we see that the derivative of the cost with respect to w two is the product of three terms. Let's calculate each one individually. The first term is just the derivative of the cost function with respect to the output of the network y hat.

This will depend on the exact form of the cost function, but it's well defined, and it can be calculated for a given training set. The second term is the derivative of the activation function. In order to calculate it, we need to calculate the derivative of the sigmoid. This is easy to calculate, and it turns out to be equal to a product of two sigmoid functions. The third term is the derivative of a linear function, so it corresponds to a one, the activation coming out from the previous layer.

So, we've calculated the three terms, and we can recompose them to obtain this expression. If you plug in the current values for zed one, a one, and y hat, this expression is well defined, and it will yield a number. This is the number we're going to subtract from the value of w two in order to update it and decrease the cost. Notice also that this term is proportional to the input a one into the second node. The correction to the weight w two is therefor obtained by subtracting a quantity that is proportional to the input through a factor called delta two.

Delta two is calculated using parts of the network that are downstream with respect to w two, and it corresponds to the derivative of the cost with respect to the input sum zed two. In a similar way, we can use the chain rule to calculate the correction to the weight in the first layer w one. We can see that this is also proportional to the input value x through a factor that we call delta one. The interesting fact is that delta one is also proportional to delta two.

This is why the procedure is called back-propagation, because we start from the error in the output, and we propagate weight corrections upstream, layer by layer, starting from the one closest to the output and then moving towards the input by walking through the inner layers. So, in conclusion, in this video, we've applied the chain rule to a simple network to calculate the weight updates, and we've learned that this technique is called back-propagation because it proceeds backwards, starting from the input and then moving towards the output. Thank you for watching and see you in the next video.