Best Model
Start course
2h 4m

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

Learning Objectives

  • Learn about the foundations and history of machine learning
  • Learn and understand the principles of memory storage, computing power, and phone/web applications

Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.


The datasets and code used throughout this course can be found in the GitHub repo here.



Hello, and welcome to this video on finding the best model. In this video, you will learn about cost minimization and how to find the best parameters. Now that we have both a hypothesis, our linear model and a cost function, the mean squared error we need to find the combination of parameters B and W that minimizes this cost. Let's do this step-by-step. Let's start with an obviously wrong hypothesis a small bias, and zero weight. We calculate our predictions we calculate the differences with the real labels we take the squares of the residuals sum them up, and obtain a certain value for the cost. Now, let's increase our values of both bias and weight for a small amount. Our linear hypothesis gets closer to our data and the total mean squared error decreases. If we keep doing that and we keep changing B and W by small amounts, we can reach a point where the mean squared error starts increasing. 

Before that, there must have been values of parameters for which the cost was at the minimum. So if we stopped there, we can say our algorithm has converged to the optimal values of the parameters for the given hypothesis cost combination. If we represented all possible values of B and W on a plane we could calculate the value of the cost for each pair of B and W. This values would form a concave profile with a minimum value corresponding to a particular choice of B and W. If we started from a random choice of B and W we can imagine stepping along this profile towards the minimum value by changing the values of B and W. The process of finding the best parameter combination is called training. We have fed our model with known pairs of features and labels and the model has found the values of its parameters that minimize the mean squared error cost. So to summarize, we have learned that training corresponds to minimizing a cost function and the minimum cost corresponds to the best model we can find for a given training set and a given hypothesis. Thank you for watching, and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.