Exercise 3: Solution
Start course
1h 13m

Continue the journey to data and machine learning, with this course from Cloud Academy.

In previous courses, the core principles and foundations of Data and Machine Learning have been covered and best practices explained. 

This course gives an informative introduction to deep learning and introducing neural networks.

This course is made up of 12 expertly instructed lectures along with 4 exercises and their respective solutions.

Please note: the Pima Indians Diabetes dataset can be found at this GitHub repository or at Kaggle page mentioned throughout the course.

Learning Objectives

  • Understand the core principles of deep learning
  • Be able to execute all factors of the framework of neural nets

Intended Audience





Hello, and welcome back. In this video, we're gonna do exercise number three, which is a benchmarking exercise. So we're given a notebook from Kaggle to look at where someone else has tackled the same dataset and has done some analysis with pretty much standard machine learning techniques and models. I did learn so I encourage you to look through this notebook and see what it's doing. There is an interesting part about feature selection that you may find interesting. Standardization, you know, and then he tests a bunch of different models. What we're gonna do is train our model on three classifier, the random forest classifier, the support vector machine, and the Gaussian Naive Bayes. We're gonna train each of these on just the positive class and predict on the test set and compare the accuracy score of each of these three and print the confusion matrix. So the random forest gives us a 66% accuracy, so not much better than the benchmark. 

The support vector machine seems to be doing a little better with 72% accuracy score, and the Gaussian Naive Bayes is at 70%. So in seeing how fast the training was. If you have a small, medium-sized dataset, it's always good to compare your neural network with the results of standard classification techniques just to have a benchmark and see am I overcomplicating things by using neural networks. Sometimes, other techniques works just as well. This is exercise three, benchmarking exercise. If you've never seen Kaggle before, I strongly encourage you to visit the Kaggle website. It's a website where you can compete for money in solving data science problems, and it's great that you can actually do that 'cause it's a great way to both learn and get better at doing data science. So, right now, there are learning competitions but there are also competitions for money and really it's a good place to for example, if you compete in this Intel competition, the prize is $100,000. So yeah, get involved in Kaggle, get involved in the community. It's a great way of learning, machine learning, and actually getting better at what you do. Cool, so thank you for watching exercise three, and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.