Cost Function: Screenflow
Start course
2h 4m

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

Learning Objectives

  • Learn about the foundations and history of machine learning
  • Learn and understand the principles of memory storage, computing power, and phone/web applications

Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.


The datasets and code used throughout this course can be found in the GitHub repo here.



Hello and welcome to this video on cost function. In this video you will learn what a cost function is. In the last video we saw that our linear hypothesis y-hat is controlled by two parameters weight and bias. In order to find the best possible model to describe our data, we need to define a way to decide how good a model is. Remember that we are solving a supervised learning task. So, we know the true value of the labels. This is to say we know the actual price of a house for each given house size in the data set. Therefore, we can compare the value predicted by the hypothesis with the actual value of the label, and calculate the error for each data point. These are called residuals and are calculated by subtracting the value of the prediction from the true value of the label. Note that in this definition, a residual carries a sign. It will be positive if our hypothesis underestimates the true price and negative if it overestimates it. 

However, we don't really care about the direction in which our hypothesis is wrong. We only care about the total amount of being wrong. We can therefore define the total error as the sum of the absolute values of the residuals. The total error is one possible example of what is called the cost function. A cost function returns a value once the hypothesis y-hat, the parameters b and w and the training data are set to fix values. For reasons that would be clear later in the course it's often preferable to use another cost function called mean squared error, the mean squared error is calculated by taking the square of each residual, summing all these squares and dividing by the total number of data points . And notice that since the square is a positive function, the mean squared error will be large when the total error is large. And small when the total error is small. In that sense there are some what equivalent. However, the mean squared error will be much larger when the individual differences are large because of the square operation. Also, the mean squared error is preferable because it's smooth and it's guaranteed to have a global minimum. Which is exactly what we are looking for. In conclusion, in this lecture you've learned about cost functions and you've learned two different types of cost functions, the total error and the mean squared error. You've also learned that for given values of training data and hypothesis, the cost only depends on the value of the parameters. Thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.