1. Home
2. Training Library
3. Big Data
4. Courses
5. Getting Started with Deep Learning: Introduction To Machine Learning

# Classification

## Contents

###### Machine Learning
1
Introduction
PREVIEW1m 45s
4
Linear Regression
PREVIEW4m 46s
5
7
10
19
21

## The course is part of these learning paths

Start course
Difficulty
Beginner
Duration
2h 4m
Students
1926
Ratings
4.1/5
Description

Machine learning is a branch of artificial intelligence that deals with learning patterns and rules from training data. In this course from Cloud Academy, you will learn all about its structure and history. Its origins date back to the middle of the last century, but in the last decade, companies have taken advantage of the resource for their products. This revolution of machine learning has been enabled by three factors.

First, memory storage has become economic and accessible. Second, computing power has also become readily available. Third, sensors, phones, and web application have produced a lot of data which has contributed to training these machine learning models. This course will guide you to the basic principles, foundations, and best practices of machine learning. It is advisable to be able to understand and explain these basics before diving into deep learning and neural nets. This course is made up of 10 lectures and two accompanying exercises with solutions. This Cloud Academy course is part of the wider Data and Machine Learning learning path.

### Learning Objectives

• Learn about the foundations and history of machine learning
• Learn and understand the principles of memory storage, computing power, and phone/web applications

### Intended Audience

It is recommended to complete the Introduction to Data and Machine Learning course before taking this course.

### Resources

The datasets and code used throughout this course can be found in the GitHub repo here.

Transcript

Where the function F is called sigmoid and it is expressed by the formula F of it's argument Zed equals to the inverse of one plus the exponential of minus Zed. I know this is a bit math-sy, but don't worry. All that you need to know is that the graph of the sigmoid looks like the figure on the right. Now that we have defined the hypothesis. We need to define a cost. We can not use the Mean Square Error like the Linear Regression Case because in the classification case the Mean Square Error is not convex, which would make it hard to find the global minimum. A better cost in this case is the Log Loss or Cross-Entropy Cost. This is defined as follows. Let's start by defining the cost for a single point as the sum of two terms. Since the labels Y I can only be zero or one. Only one of these two terms will be present for each data point. Another way to read this expression is to say that the cost is equal to negative logarithm of one minus the predicted probability when Y I is equal to zero. And to the negative logarithm of the predicted probability when Y I or the label is equal to one. Let's look at each term individually. Let's start from the second term. Remember that Y hat, the probability contains the sigmoid function. So, it's negative logarithm evaluates to minus the logarithm of one plus E to the minus X. If X is really big, this quantity goes to zero.

While if X is negative, this quantity goes to infinity in a linear way. In other words, When the label is one, we expect Y hat or the probability of our model to approach one. Which happens for large values of X in the sigmoid curve. So if X is positive and large. We make our cost very small. While if X is negative we make the cost larger and larger. The same logic applies to the first term when the label is zero. The contribution to the cost of this term will be low when X is pushed towards negative value, which makes Y hat approach zero in this case. Now that we have defined a cost for a single point we can define the total cost as the average of the cost for the individual points. This cost function goes by the name of Average Cross-Entropy or Binary Log Loss. We have defined a hypothesis and cost for our classification problem. Now we can go ahead and look for the best parameters that minimize this cost in a similar way to what we did for the Linear Regression Case. One final point, notice that our Logistic Regression Model predicts a probability. If we want to convert this to a binary prediction we need to decide how to convert it to a binary outcome. One way to do this is to set a threshold. For example, we could say that all points predicted to be one with probability greater than zero point five are set to one and all others are set to zero.

With this definition we can also calculate a score for our model. This is the Accuracy Score and it's defined as the number of correct predictions over the total number of points. So, for example, in this table we have three correct prediction in a total of five attempts. This corresponds to accuracy score of 60%. Similarly the The Regression Case we can compare accuracy on the training set with the accuracy on the test set, and judge how well our classification model is doing when generalizing to unknown data. In conclusion, in this video we learned that classification problems can be handled in a similar way to regression problems by asking the model to predict the probability that a data point belongs to a certain class. We have learned ho to use the Sigmoid Function to map all the numbers predicted by a linear function on to the interval zero one of probabilities and we've learned to define the Log Loss as the preferred cost for binary classification. Finally, we've learned about Accuracy which is the score we will use to judge how good a classification model is. So, thank you for watching and see you in the next video. 