Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

**Learning Objectives**

- Learn about the foundations and history of machine learning
- Learn and understand the principles of memory storage, computing power, and phone/web applications

**Intended Audience**

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.

### Resources

The datasets and code used throughout this course can be found in the GitHub repo here.

Hello and welcome to this video on overfitting. We will start this video with a summary of supervised learning, then we will define what overfitting is, and we will list a few common mistakes that lead to overfitting. Let us recap what we have learned so far in supervised learning. We have seen two supervised learning techniques: Linear regression and logistic regression. Linear regression deals with problems where the target variable takes continuous values on the real axis. While logistic regression deals with problems where the target variable is binary and can only be zero or one. Linear regression is defined with the linear hypothesis that connects the features x to the outcome y. The logistic regression connects the features to the probability of the outcome to a non-linear function called sigmoid.

The net effect of the sigmoid is to map all the possible values of b plus X dot w into the interval zero to one. Finally, we have defined a cost for each of them using the mean squared error for the linear regression and the cross entropy or log loss for the logistic regression. Notice that here we have introduced a vector notation where X dot w is equal to the sum of products of the components of x and w to extend the model to data sets with multiple features. All these means is that now w is a vector of size M, where M is the number of features, while X is a matrix of size N by M. Where N is the number of records in our data sets and M is the number of features. For the case of only one feature, the sum only has one term and we're back to the case we previously discussed. We have also learned to split our data in two parts, a training set and a test set. Now, let's talk about one thing to watch out for, overfitting. Overfitting happens when our model learns the probability distribution of the training set too well. And, it's not able to generalize to the test set with the same performance. Think of this as learning things by hard without really understanding them. In a new situation, you would probably be lost and probably underperform. A very simple way to check for overfitting is to compare the cost and the performance metrics on the training and the test set. If our training score is much better than the test score, we can be sure that we are overfitting. For example, let's say we're performing a classification and we measure the accuracy to be 99%. If this is the score on the training set and on the test set, we only obtain 85%, the performance of our model is worse on the test set than it is on the train set, and therefore, we are overfitting. How can we avoid overfitting? There are several actions we can take.

The first simple check is to make sure there are train test split is performed correctly and both the train and test sets are representative of the whole population of features and labels. Also, make sure that you're sampling the training and test set randomly, so that the order of the data doesn't affect your sampling. Third, make sure not to choose too small at test set, which could simply not contain enough data to be a good test. And finally, don't choose too small at training set, which could cause your model not to learn generally enough rules. If the train test split seems correct, it could be the case that our model as simply too much freedom and therefore, learns by hard the training set. This is usually the case if the number of parameters in the model is much greater than the number of data points. And, it's definitely a very common problem in neural networks. In order to mitigate this, we can either reduce the complexity over the model or use a technique called regularization, that we will learn later in this course. So, in this video, we've reviewed the main ingredients of supervised learning and these are data, labels, hypothesis and cost. We have also discovered that overfitting corresponds to inability of our model to generalize to previously unseen data. Finally, we have described a few common mistakes to avoid. Thank you for watching and see you in the next video.

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.