- Home
- Training Library
- Big Data
- Courses
- Getting Started With Deep Learning: Working With Data: Gradient Descent

# Exercise 1: Solution

## Contents

###### Gradient Descent

## The course is part of these learning paths

Learn about the importance of gradient descent and backpropagation, under the umbrella of Data and Machine Learning, from Cloud Academy.

From the internals of a neural net to solving problems with neural networks to understanding how they work internally, this course expertly covers the essentials needed to succeed in machine learning.

**Learning Objective**

- Understand the importance of gradient descent and backpropagation
- Be able to build your own neural network by the end of the course

**Prerequisites**

- It is recommended to complete the Introduction to Data and Machine Learning course before starting.

Hey guys, welcome back. In this video, we're going to go through a solution of exercise one of section five. So exercise one was asking us to predict the class of a wine based on a few attributes of the wine. And some things were standard, like scaling the features. We've done it a few times. But some other things, we had to probably think a bit more about our choices, like what model we're building, what loss function, cost function we're gonna use, what optimizer, and what value for the learning rate, and so on. One important thing was we had to validate our training with a validation split of 20%. And the question is, can we converge to 100% validation accuracy? And how quickly can we converge to it? So let's see what a possible solution to the exercise looks like. First, we load the data. It's in the wines file, and you see there is a class.

It is the first column, and then we have a bunch of numerical features related to measurements done on the wine. So we set the class, we assign a class to the variable, y, and we check how many classes we have. We have three classes, class one, two, and three. Now, we don't know what the order of these classes is, so although they are numbers and we could think it's a regression problem, it's actually better to treat this as a multi-class classification with these three distinct categories. And that's what we're gonna do. We are going to create an auxiliary y categories target where we've dummified our classes into three columns, three binary columns, class one, two, three. Perfect. The other thing we're going to do is to create our features, and we have 178 points with 13 features. So let's see what they look like.

I'm going to plot them using the pair plot from seaborn. As you can see in the pair plot, our features manage to separate the three classes quite well. You see that the three clouds of points are actually quite distinguishable in several of the feature pair combinations, like this one for example. So, I'm pretty confident we'll be able to separate our data almost perfectly into the three classes, so let's see. First thing we're gonna do is rescale our features. So we'll take the standard scaler and rescale the X table onto a new X-scale table. Then, we'll load form Keras the usual sequential model and the dense layer for the fully-connected, a few algorithms for optimization in case we want to try a few of them, and then we build our model. The model is actually quite a simple model. It only has one inner layer with five nodes, an input shape of 13, and then the output layer with an activation of softmax. RMSprop algorithm for optimization. I'm curious to hear if you tried others. How did it go? And given it's a multi-class classification, the correct loss function is the categorical_crossentropy.

Then, when we fit the model, it's important to choose a batch size. Given how small the data set is, I chose a pretty small batch size of eight. I run the training for 10 epochs, with the verbose equal one and validating on 20% of the data. So let's run it and see what happens. The model seems to find the solution very quickly. As you can see, my validation accuracy is 100% already at the first epoch. The fact that it fluctuates a bit, it's because, from one epoch to the next, Keras is shuffling the data. So we could choose not to shuffle the data. If we wanted, in that case, it would keep the same split through all the epochs. So I'm curious to hear what you tried, so please post it in the forum, and thank you for watching. See you in the next video.

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.