Start course
1h 45m

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

Learning Objective

  • Understand the importance of gradient descent and backpropagation
  • Be able to build your own neural network by the end of the course




Hello, and welcome to this video on Exponentially Weighted Moving Average or EWMA. In this video, you will learn the most important algorithm of your life. This algorithm crops up everywhere, from Financial Times Series to signal processing to neural networks. I'm always amazed by the many different ways people call it, but it's actually the same thing over and over again. It's called Exponentially Weighted Moving Average or EWMA, for short. 

Let's say we have a sequence of ordered data points. This could be the values of a stock on the stock market or measurements of temperature or weather pressure or anything that is measured in a sequence. If this data is noisy, we may want to reduce the noise with a smoothing technique. One easy way to remove noise from a time series is to perform a moving average. You wait to accumulate a certain number of observation and use their average as the estimation of the current value. This method works but it's not easy to implement in practice. It requires to hold many past values in a memory buffer, and constantly update such buffer when a new data point of the sequence arrives. Exponentially Weighted Moving Average solves this problem with a recursive formula so, that we only need to keep track of the last value of the average itself. 

This formula says that, the value of the moving average at time T, is a mix between the value of the raw signal at time T, X sub t, and the previously value of the moving average itself, S sub t minus 1. The degree of mixing is controlled by the parameter, A, which takes values between zero and one. If, A is small, say 10%, most of the contribution will come from the previous values of the signal. In this case, the smoothing will be very strong.

 If, A is large, on the other hand, say for example, 90%, most of the contribution will come from the raw signal and smoothing will be minimal. Let's walk through the calculation for a simple example, with A equals 10% or 0.1. When the first point comes in, the EWMA is set to be equal to the raw data. This is true only for the first point and it's called initial condition. When the second row value comes in, we take 10% of it, which is 0.2, in this case. And add it to the 90% of the previous value of the moving average, which was one. So, we added to 0.9. The result is 1.1, slightly bigger than our starting value of one. 

Then the third point comes in. Again, we take 10% of its value, 0.5 and add it to the 90% of the current EWMA value, which is 0.99. We can continue playing this game at each new point. And all we need to keep in memory is the previous value of the EWMA, until we've mixed it with a current raw value of the signal. This algorithm is very efficient and very useful to smooth signals. I'm sure you will encounter it in many other occasions. 

And in the next video we'll apply it to neural networks. So, to summarize, in this video we introduced the most important algorithm of your life. It's called Exponentially Weighted Moving Average. We've seen that it's obtained with a recursive formula that mixes the rough signal with a current value of the moving average itself. And it's an algorithm that is used for smoothing a signal that is very noisy. So, thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.