Exercise 1: Solution
Start course
2h 4m

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

Learning Objectives

  • Learn about the foundations and history of machine learning
  • Learn and understand the principles of memory storage, computing power, and phone/web applications

Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.


The datasets and code used throughout this course can be found in the GitHub repo here.



Hey guys, welcome back. Here we are to look at the solutions to the exercises. So let's start from the first exercise. Okay, using sales to load the standard packages. In this exercise we were asked to load a dataset on housing data, plot the histogram of each feature, create two variables called X and Y where X is a matrix with three columns, square feet, bedrooms, and age, and y is a vector with one column for the price. And create a linear regression model in Keras with the appropriate number of inputs and outputs, then split the data with a 20% test size, train the model on the training set and check the accuracy on the training and test set, and finally answer the question how's the model doing. Okay, finally we had some experiments like can we normalize the features, can we try different values for the learning rate, use a different optimizer, and check the R2 score. Alright, let's get started. So the first thing we're gonna do is load the model. We use the pd.read_csv function. We just need to pass the file path and so when we execute the cell, we can look at the head of the data frame and that's showing us the first five rows. Alright, moving on. The histogram, there are various ways of plotting the histogram. The one I decided to use is the following. I create a for loop, okay, over the column name. So df.columns here is just the columns, okay, square feet, bedrooms, age, and price. And I created a for loop that iterates over the column and also counts them, so enumerate will count in order from zero, so for in feature where feature is one of these, I generate the subplot in a grid where there's one row and four columns. Notice that I'm using the count plus one just because subplot needs to start from first plot with number one, not zero. 

And then what I do is I do df feature, plot, kind, histogram, and the title is a feature. I also use the same for the X label. So this is what they look like, the for plots. We have square feet, bedrooms, age, and price. And yeah, you could have done it in other ways but hopefully you got to a similar plot. So why is it important to draw a histogram? It's important to draw a histogram because we can see what are the common values. We can see if there's, like, any value that is more prevalent, more present than others. We also have a visual check of the range of our feature and so it can give us hints of whether or not it needs rescaling, like for example, here we see that square feet as expected is in the thousands, but bedrooms is units, and age is in 10s, and price is like super big. So yeah, they probably doing better to rescale the features at some point. Let's get to the X and y variables. The way we do this is we extract the three columns we care about and we take the values, which is the number array associated with this pandas.DataFrame. We do that for three columns and we also do that for the price pro. So let's see what we got. 

Just inserting two columns, two rows here, and X is a number array with three columns and how ever many rows we need, and y is a row vector. Okay, so we got our inputs and our outputs. Let's build a model, so create a Keras model, we import Keras, you know, we import the usual things. And we build a model, it's a regression model. The only thing that is different from what we've done is this input shape. We need to set the input shape to accept for three values instead of one because now we have three input features. One, two, three, right? So we add a dense layer with one output because we are predicting price, it's one variable, but we need to have three inputs, right? And then we compile the model with a certain optimizer and a mean squared error loss. Okay, we compile that. Next was split the data in train and test with 20% test size. Okay, let's do that too. We've seen that in class so I'll go pretty fast. Essentially we load the train_test_split from model selection in scikit-learn and then use it on both X and y, and what we get is this X_train, X_test, y_train, y_test. So to be sure, we can check the length of X_train, 37, it's a small dataset, and length of X is 47 so it's about 80%. Great, next step, train the model. The training part, it's pretty easy. It's just calling the fit function on X_train and y_train. And the default parameters will train for 10 epochs, 10 rounds, and we have batch size of 32. So we'll just go with that and see what happens. Now one thing I want to show you that I think is pretty important. Look at the loss function, and think for a second why is it such a big number. First of all, we notice that the loss is going down, which is good, it's what we want, right? We want the loss to go down over time. But why is it so huge, why is it so big?

 And the answer is the loss is calculated in terms of the differences between our predicted values and our actual values and remember, we are predicting the price, so the price. So if we take the price column, the minimum value in that column is 169,000 and the maximum value in that column is over 700,000. So a tiny difference in the price prediction will still be pretty huge and so the sum of the squares of the predictions and then averaged is still gonna give a huge huge number. That's why we know rescaling the price variable may be the first thing we're gonna do later. And now we're going to normalize all the features and we're gonna do that with arbitrary values to still have understandable ranges. So the square feet we create a new feature, we call it sqft1000 where we just divided by a thousand. This is because square feet are between 852 and 4,478. So if we divide by a thousand, the new feature will range from 0.8 to 4.4, which is in the units range. And the age we divide by 10, so it's gonna go from 0.5 to 7.9 and the price we divide by 100K. So it will go from 1.6 to 6.9. Great, so we do that and we take the new features. By the way, we don't rescale the bedrooms because those are already in a reasonable unitary unit range. Okay, so we will scale these and our new features are these three columns, okay. So we take these three columns, take the values, and assign it to X and y and again we do the train_test_split with a test size of 20%. We'll build a new model, okay, and we train it, and notice that now the loss is in the reasonable range of zero point something, which is what you would want, you know, if your features are well normalized you would want the loss to be zero point and below. Good, so we've trained our model, you know, 20 epochs, the loss seems to stay pretty constant, more or less towards the end, and so we can check the R2 score. Let's see, we generate the predictions for the training set, we store them in y_train_pred, we generate the prediction for the test set, and then we compare, we calculate the R2 score for the training set and the R2 score for the test set. Okay, so what do we get? The training set score is .54 which is not great, but it's better than nothing. 

And the R2 score for the test set is .64 which is even better than what we got in training set. So overall, this model is not doing that bad. We can try to run another 20 epochs, so if you just hit the fit function again. So let's add a couple of rows here. You just gonna hit the fit function again, so it would keep training from where it was. You would not reset the model, and just put verbose to zero. Okay, running it for 20 more epochs, let see what happens, did it improve. Now we are generating the predictions again. Yeah, it did improve a bit on the training set, but it stayed kind of constant on the test set which basically it's indicating that our model is starting to know the training set really well and kind of overfitting, but not really improving its generalization properties. Keep in mind that the dataset in this case is really small so it's more like to practice that we are doing this, because with 47 data points it's not really worth calling machine learning for solving a problem. Good, I hope you enjoyed this little problem and working through it. I hope you've learned something interesting and see you in the next video for exercise two.


About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.