Linear Regression
Linear Regression
2h 4m

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

Learning Objectives

  • Learn about the foundations and history of machine learning
  • Learn and understand the principles of memory storage, computing power, and phone/web applications

Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.


The datasets and code used throughout this course can be found in the GitHub repo here.



Hello, and, welcome to this video on linear regression. In this video, you will learn what linear regression is. And, you will learn what it means to formulate a hypothesis that depends on parameters. Let's take a second look at the plot you drew in section two, it represents a population of individuals. Each dot is a person, and, the position of the dot on the chart is defined by two coordinates, height and weight. Now, would you say there is a pattern in how the dots are laid out, or not? Do they seem completely random? I'm sure your visual brain, which is a great pattern recognizer, notified you that there is a pattern. Dots are roughly spread around a straight, diagonal line. 

This line seems to indicate the obvious, that is taller people are also heavier, on average. The question is, how can we specify this relationship more precisely? What we're saying, is the weight, or our target variable or, label is a linear function of the height, our only feature. And, this is true in general, when we say the relationship between features and target is like a straight line or a flat plane, we are saying that there is a linear relationship connecting them. So, now, we're going to learn about a technique called, linear regression. Let's consider another similar example. Let's say that we had a data set of sizes and prices of houses. The only feature in this case, is the size of the house. And, the target or label is the price of the house. Let's assign variable names to our quantities. Let's indicate labels or price with the letter y and, the feature or size with the letter x. We're looking for a linear relationship, that connects x to y. Another way to say this, is we're formulating a hypothesis, and, this hypothesis says, y is the linear function of x, plus some small error. If we indicate the hypothesis with the letter y hat, we're saying that y hat is b plus x times w. And, the parameters b and w control our linear hypothesis. If we choose both b and w to be zero, the value of y hat is going to be zero, for all possible values of x. 

This corresponds to a horizontal line passing through zero. And, in other words, we're saying that every house is free. Now, while this would be great, it's for sure not true. And, not useful to be a predicting model. So, we need to let b and w vary. If we let b vary, the horizontal line starts to move up, indicating, a constant price regardless of the value of x. This is like saying prices fixed to the value of b, regardless of the size of the house. I don't know about you, but, I would buy gigantic house, in that case, to stuff with all my things. This is also why we call the parameter b, a bias or an offset, because it's like a fixed value in our prediction, regardless of the value of the feature b. Finally, if we let w vary, the line starts to tilt with w indicating the increment in price, corresponding to the increment of one unit of size. For example, if w equals one that would imply that one additional square foot corresponds to a thousand dollars increase in price. To summarize, we have defined the linear hypothesis that connects our input features to our output label, and that hypothesis only depends on two parameters, b or bias and w or weight. The bias controls the fixed offset in our prediction, while the weight controls the proportional content, in our prediction. Now, we need to find a way to optimize the value of these parameters, so that the model best describes our data. Thank you for watching and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.