- Home
- Training Library
- Big Data
- Courses
- Getting Started with Deep Learning: Introduction To Machine Learning

# Cross Validation

## Contents

###### Machine Learning

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

**Learning Objectives**

- Learn about the foundations and history of machine learning
- Learn and understand the principles of memory storage, computing power, and phone/web applications.

**Intended Audience**

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.

### Resources

The dataset used in exercise 2 of this course can be found at the following link: https://www.kaggle.com/liujiaqi/hr-comma-sepcsv/version/1

Hello and welcome back. In this video we will perform cross validation on our model. So, first of all we need to import a wrapper that will allow us to use Keras models in scikit learn. We need to do this because the cross validation function is actually in scikit_learn. So lets import the KerasClassifier wrapper. And then let's define a helper function that is called build-logistic-regression-model and this just builds the model like we did before. So defines the sequential model, add the dense layer with one input, one input, one output and the sigmoid activation function and then compiles the model using a certain laws the binary_crossentropy and also requesting the accuracy. And this is the optimizer we'll se later what it is. Then we return the compile model from our function. We've defined this function because the KerasClassifier wrapper needs a build function to work, so we actually need it to do this helper function for KerasClassifier, notice that in KerasClassifier we also define the number of epoches and whether or not it needs to be verbosed. So basically now this model which is KerasClassifier wrapper will behave not just like a Keras model but like a Scikit_learn model. Which means it exposes all the methods required by the cross_val_score function from Scikit_learn to perform. So, let's do cross validation, we import the cross_val_score and we also import this class which is Kfold cross validation which by now you are familiar with.

And we define a three fold cross validation, so let's check the signature, this is the number of splits so fold is three and just explicitated so we are splitting our data in three equal sides sub-sets and resetting shuffle equals true which means we will take random shuffling of the data before we do our splits. Okay, so these are cross validations. And then we pass the model which is our KerasClassifier the x and the y, the features and the labels and the cross validation iterator to the cross_val_score function. Okay, so a cross_val_score function needs an estimator which we've created here, features, labels and bunch of other things that we're not setting and those that has those parameters we can set other things. but most importantly it has a cv module, cv argument sorry. Okay, so we execute this line, it will take a little bit of time, okay great. And we have obtained three scores for our three fold cross validation, notice how they're similar, which means the shuffling did its job. Helper didn't end up with really biased samples. And now we can print the mean and the standard deviation of the scores, so the cross validation accuracy is 82 percent plus or minus 0.25 percent. This is actually good accuracy. the data set is small, but still. So, pretty satisfied with this result. And see you in the next video

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.