- Home
- Training Library
- Big Data
- Courses
- Getting Started With Deep Learning: Working With Data: Gradient Descent

# Fully Connected Backpropagation

## Contents

###### Gradient Descent

## The course is part of this learning path

**Difficulty**Beginner

**Duration**1h 45m

**Students**337

**Ratings**

### Description

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

**Learning Objective**

- Understand the importance of gradient descent and backpropagation
- Be able to build your own neural network by the end of the course

**Prerequisites**

- It is recommended to complete the Introduction to Data and Machine Learning course before starting.

### Transcript

Hello and welcome to this video on fully connected neural networks. In this video, we will go through the calculation of the back propagation for a fully connected neural network. In the last lecture, we calculated the back propagation for a network with only two nodes in series. So, let's see how we can expand this calculation to a fully connected neural network. In a fully connected network, each layer contains several nodes and each node is connected to all of the nodes in the previous and in the next layers. The weights in the layer are identified by two indices, k and j, where k indicates the receiving node and j indicates the emitting node.

The input sum at layer l and node k, which we will indicate with Z-l-k, is the weighted sum of the activation of layer l-1 plus the bias term of layer l. Like before, we can expand the derivative of the cost function with respect to the weight W-k-j in layer l using the chain rule and we notice that the last two terms are identical to before. The derivative of the activation, A, is just the derivative of the sigmoid and the derivative of the input sum, Z, is the activation of the previous layer.

The only different term in this case is the derivative of the cost with respect to the activation A-l-k. Since the activation A-l-k is part of the input to each of the following nodes in the next layer, l+1, we have to apply the chain rule to each of them, and sum all the contributions together. Notice that in the last term, we can replace the derivative of the input sum with the weights of the layer, l+1. And the derivative of the cost function with the delta, like we did in the one dimensional case. So, we notice that the deltas are connected by a recursive formula.

Which means we can obtain the deltas at layer l as a function of the deltas at layer l+1. We have already seen this in the unidimensional case when delta one was a function of delta two. So, this is why it's called back propagation and how it works with many nodes. Again, we start from the last layer, the one closest to the output, and we calculate the deltas and propagate them back through the layers using the recursive formula that is displayed here. I know your head may be spinning a little bit after all these indecisive formulas, so in the next video, we will see how to simplify the notation a little bit.

But, first of all, I want to complement you for making it this far. In this video, we went through one of the hardest part of neural networks which was calculating the back propagation for a fully connected network. You've gone through the back propagation algorithm which is the core of how neural networks are train. So, give yourself a big applause and take a quick break. Neural networks have no more mysteries for you. Thank you for watching and see you in the next video.

# About the Author

**Students**2908

**Courses**8

**Learning paths**3

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.