Exercise 2
Start course
2h 4m

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

Learning Objectives

  • Learn about the foundations and history of machine learning
  • Learn and understand the principles of memory storage, computing power, and phone/web applications

Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.


The datasets and code used throughout this course can be found in the GitHub repo here.



Hey guys, welcome back. This video is about exercise two in section three. In this video, we will step up from the house pricing model and we're going to perform a classification on a slightly more complex data set where we have information about employees in a firm. This data set contains labels about whether or not the employee's left the company. Together with a lot of information about the employee, including the last evaluation, the number of projects, the average monthly hours, and so on. These are all the features you have. So your goal is to predict whether the employee left or not using the rest of the data. So here, too, you're guided through nine steps. First, you will have to load, inspect the data set. Then, the next step is to establish a benchmark, and see what the accuracy would be if you predicted that everyone stayed at the company. Then, check if the features need rescaling. Maybe plotting a histogram to decide which rescaling method is more appropriate. 

Then, you have to convert the categorical features into binary dummy columns. So, this would only be for a few columns that are categorical. You'll have to use the pd.get_dummies or equivalently the np_utils to categorical function from Kares to convert the categorical features into dummy columns. Then, concatenate these dummy columns with their numerical columns. Do a train/test split, and check how well your model is doing. You can always try to optimize it with a learning rate and changing the optimizer. Check the confusion matrix, precision and recall, and see if you still get the same results using a five-fold cross validation. 

The last question is, is the model good enough for your boss? Which means, compare the results you get with the results from the benchmark, your initial benchmark. We also already tell you that a logistic regression is not good enough in this case, but it's nice to go through all the steps and see where a model fails so that in the future we'll be able to build a better model. So as usual, try to do this exercise first and then feel free to watch the next video with the solution. Good luck!


The datasets used in exercise 2 of this course can be found at the following link:

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.