1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. Getting Started With Deep Learning: Working With Data: Gradient Descent

Derivatives and Gradient

Developed with
Start course
Duration1h 45m


Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

  • Understand the importance of gradient descent and backpropagation
  • Be able to build your own neural network by the end of the course




Hello, and welcome to this video on derivatives and gradient. In this video, we will talk about derivatives and gradient. As the name suggest, a derivative is a function that derives from another function. Let's start with an example. Imagine you are driving on the highway as time goes by, you mark your position along the highway filling a table of values as a function of time. If you're speed is 60 miles an hour every minute your position will be increased by one mile. Let's define the function, x of t to indicate your position as a function of time. The derivative of this function is the rate of change in position with respect to time. In this example, it corresponds to the speed of your car indicated by the odometer. In general, the derivative dx over dt is itself a function of t that tells us the rate of change of the original function, x of t, at each value of time. This is why it's called a derivative because it derives from another function. Here's a graphical explanation of it. At each point along the curve the derivative is the value of the slope of the curve itself. We can calculate the approximate value of the slope by the method of finite differences. The value of the derivative is negative when the slope of the original curve is downhill and it is positive when the slope is uphill. 

Finally, if we are the minimum or at the maximum, the derivative is zero because the slope is flat. Derivative of simple functions are known. Here are a few common cases. The derivative of a constant is zero because a constant doesn't change. The derivative of a linear function is a constant. And a very important one is the derivative of the exponential, which is the exponential itself. When our function has more than one input we need to specify which variable we are using for derivation. For example, let's say we are measuring our elevation on a mountain as a function of our position. Our GPS position is defined by two variables longitude and latitude. And therefore, the elevation depends on two variables. Let's change the variable names to shorter ones. Let's call the elevation y, and the two variables x one and x two. 

We can calculate the rate of change in elevation with respect to x one, and the rate of change with respect to x two. These are called partial derivatives because we only consider the change with respect to one variable. Notice also that we use the different symbol to indicate these derivatives because these are partial derivatives. If we are on top of a hill, the fastest route downhill will not necessarily be along any of the north-south or east-west directions. It would be in whatever direction the hill is more steeply descending down. In the two-dimensional plane of x one and x two, the direction of the most abrupt change will be a two-dimensional vector whose components are the partial derivatives with respect to each variable. We call this vector gradient and we indicate it with an inverted triangle which is also called del or nabla. The gradient is an operation that takes a function of multiple variables and returns a vector. The components of this vector are all the partial derivatives of the function. Since the partial derivatives are functions of all variables, the gradient, too, is a function of all variables.

 To be precise, it's a vector function. For each point, x one x two, the gradient returns the vector in the direction of maximum steepness in the graph of the regional function If we want to go downhill, all we have to do is walk in the direction opposite to the gradient. This will be our strategy for minimizing cost functions later on. In conclusion, in this video, we've encountered derivatives and explained how they correspond to the slope or the rate of change in a function. We've also introduced the gradient which extends the derivative to functions of many variables and returns the vector of steepest change. Thank you for watching and see you in the next video.

About the Author

Learning paths3

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.