Backpropagation Intuition
Start course
1h 45m

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

  • Understand the importance of gradient descent and backpropagation
  • Be able to build your own neural network by the end of the course




Hello and welcome to this video on back-propagation intuition. In this video, we will talk about what back-propagation means and how it works. We have defined the gradient in the previous video, so now let's talk about back-propagation. Let's say we have a function of only one variable called w. For every value on the horizontal axis, the function associates a value on the vertical axis. Let's say we're sitting at a particular point like in the figure. Let's also assume that we do not know the function f of w at every possible point. We only know it near where we are. We want to move in the direction of decreasing f of w, but we can only use local information. How do we decide where to go? As we've when we talked about descending from a hill, the derivative indicates its slope at each point, so we can calculate the derivative where we are, and then change our position by subtracting the value of the derivative from the value of our starting position w. In other words, we can take one step, following the rule w becomes w minus the partial derivative of f with respect to w. Let's check that this does move us towards lower values on the vertical axis. If we are sitting at w, the slope of the curve is negative, and thus, the quantity minus the f in d w is positive, so the value of w will increase, moving us towards the right on the horizontal axis. The corresponding value on the vertical axis will decrease, so we successfully moved towards the lower value of the function f of w. 

Vice versa, if we were to start at the point w where the value of the slope is positive, we would subtract the positive quantity d f in d w from w. This will move w to the left, and the corresponding values on the vertical axis would still be lower than what they were when we started. This way of looking for the minimum of a function is called gradient descent, and it's the idea behind back-propagation. Given a function, we can always move towards its minimum by following the path indicated by its derivative, or, in the case of multiple variables, indicated by the gradient. As you know by now, for a neural network, we define a cost function that depends on the values of the parameters, and as you also know, we find the values of the parameters by minimizing the cost by gradient descent. All we are really doing is taking the cost function, calculating its partial derivatives with respect to each parameter, and then using the update rule we just described to decrease the cost by updating the parameter. We do this by subtracting the value of the negative gradient from each of the parameters. 

This is what's called a parameter update. In conclusion, in this video, you've learned about gradient descent, and how to use it to minimize the cost. We've also formed an intuition about what back-propagation is. It's essentially the rule to update the parameters in order to minimize the cost. Thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.