- Home
- Training Library
- Amazon Web Services
- Amazon Web Services Courses
- Introduction to Machine Learning Concepts

# Supervised Learning

## Contents

###### Machine Learning Concepts

## The course is part of these learning paths

In this course, you'll learn about Machine Learning and where it fits within the wider Artificial Intelligence (AI) field. The course proceeds with a formal definition of Machine Learning and continues on with explanations for the various machine learning and training techniques. We review both Supervised and Unsupervised learning, showcasing the main differences between each type of learning method. We review both Classification and Regression models, showcasing the main differences between each type of training model.

We provide a basic review of several of the most popular and commonly used machine learning algorithms including:

- Linear Regression
- Logistic Regression
- K Nearest Neighbour (KNN)
- K-Means
- Decision Tree
- Random Forest
- Support Vector Machines (SVM)
- Naïve Bayes

Finally, we’ll provide a basic-level introduction to Deep Learning and Deep Neural Networks, as a more specialised form of Machine Learning.

**Intended Audience**

The intended audience for this course includes:

- Beginners starting out to the field of Machine Learning
- Anyone interested in understanding how Machine Learning works

**Learning Objectives**

By completing this course, you will:

- Understand what Machine Learning is and what it offers
- Understand the benefits of using the Machine Learning
- Understand business use cases and scenarios that can benefit from using the Machine Learning
- Understand the different Machine Learning training techniques
- Understand the difference between Supervised and Unsupervised training
- Understand the difference between Classification and Regression
- Become familiar with several of the commonly used and popular Machine Learning algorithms discussed
- Understand the basic principles behind Deep Learning and Deep Neural Networks

**Pre-requisites**

The following prerequisites will be both useful and helpful for this course:

- A background in statistics or probability
- Familiarity and understanding of computer algorithms
- Basic understanding of data analytics

**Course Agenda**

The agenda for the remainder of this course is as follows:

- We’ll discuss what Machine Learning is and when and why you might consider using it
- We’ll discuss benefits and business use cases that have been empowered by leveraging Machine Learning
- We’ll breakdown machine learning into supervised and unsupervised training models
- We’ll discuss the differences between classification and regression techniques
- We’ll examine a set of commonly used and popular machine learning algorithms
- Finally, we’ll take an introductory look at deep learning and the concept of deep neural networks.

**Feedback**

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.

Welcome back. In this lecture, we'll now start diving into Supervised Learning, and how you use it to train machine learning models. Let's start with the following question. I have a data set that contains right answers, I want to learn a pattern and use it on new data points for which I don't have answers. How can I use machine learning to predict the answers I seek?

Supervised Learning involves using some sort of algorithm to analyze and learn from past observations, enabling you to then predict future events. The goal of Supervised Learning is to come up with, or infer, an approximate mapping function that can be applied to one or more input variables, and produce an output variable or result. The training process involves taking a supervised training data set with non features and a label.

The training data set consists of many training examples, or instances, where each instance consists of an input vector of features, and a single output value, or label. A supervised training dataset is used to train at a machine learning model, such that, future predictions can be made on new inputs. The training dataset is typically proportioned off into two parts. The first part, used to train the model, and the second part, used to test the model. An example approach is to use an 80/20 split. That is, to set aside 80% of the training dataset for training the model, and the remainder 20% for testing the accuracy of the model. That is, to test the correctness of predictions by comparing the predicted results against the actual recorded, or labeled result.

A Supervised Learning algorithm analyzes the training data, and produces a mapping function, which is called a classifier if the output is discrete, or a regression function if the output is continuous. For example, a classification problem is when the output variable is a category, such as, male or female, or smoker and non smoker. A regression problem, on the other hand, is when the output variable is a real value, such as temperature or length. Some common types of problems built on top of classification and regression include, fraud detection and temperature forecasting, respectively. This can be seen on this slide.

Supervised Learning algorithms fall into two categories, those that are used to solve classification problems, and those that are used to solve regression problems. For example, we can see that we can use either the Linear regression algorithm or Decision Tree algorithm for regression problems. We can use either the Support Vector Machine algorithm, or Naive Bayes for classification problems. When performing supervised training the process starts with the training phase.

The training phase requires you to perform feature extraction to establish feature vectors. The chosen supervised training algorithm takes both the feature vectors and labels and builds a predictive model. The predictive model is then tested for accuracy. The separate model training and testing phases are reliant on using different subsets of the initial dataset. For example, either an 80/20 split or 75/25 split in favor of the training subset is typically used. Walking through a complete example, let's say we take an initial data set, and randomly sample from it 80% of the records. This becomes our training dataset. The remainder 20% becomes our testing dataset. We then train our machine learning model on the training dataset.

Once the predictive model has been learned and built, we can test it for accuracy by running the testing dataset through it. Next, we can examine and compare the predicted results against the actual label tagged to the record in question within the testing dataset. If the testing of the model falls short of our expectations, we can turn the parameters of the supervised training algorithm and retrain the model, or choose an entirely different supervised training algorithm.

Finally, if our model lives up to our expectations in terms of providing accurate predictions, we can deploy the model and begin to feed new data points through it by firstly performing feature extraction and passing the resulting feature vectors into our model. The end result will be a predicted label for the data point in question. Supervised machine learning can be used to answer many business problems. The important thing to consider when attempting to answer these questions, is where and what data sources are available for training. Some examples are given here. How much is this home worth? Here, we train our model with the previous home sales dataset. How many customers will watch season 2? Here, we train our model with the previous year's viewing statistics dataset. Is this cancer malignant? Here, we train our model with the previous malignant cancer's dataset.

And finally, will this customer default on a loan? Here, we would train our model with the previous loans that were paid off or defaulted dataset. Okay, let's quickly summarize supervised training, it's important characteristics, how it is used, and some example algorithms. Most importantly, each instance within the training and testing datasets, must have a label. The label is sometimes referred to as the outcome objective, or result.

The goal is to build a machine learning model that can accurately predict future outcomes. If the label is categorical, then the model is considered a classification model. If the label is numeric, then the model is considered a regression model. Since the supervised training dataset has a non-value, we can evaluate and test the accuracy of the model. To do so, we split the data into a training set and test set. For example, we could use an 80/20 split or a 75/25 split.

Next, we built the model using the training set. Finally, we test the model using the test set, comparing the predictions against the non-values. There are many supervised training algorithms. Some example algorithms are, Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees, Naive Bayes, and K Nearest neighbours. Later on in the course, we'll review each of the algorithms presented here in more detail. That concludes our lecture on Supervised Training. In the next lecture, we'll draw down into the details of unsupervised training. Go ahead and close this lecture, and we'll see you shortly in the next one.

Jeremy is a Content Lead Architect and DevOps SME here at Cloud Academy where he specializes in developing DevOps technical training documentation.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 25+ years. In recent times, Jeremy has been focused on DevOps, Cloud (AWS, GCP, Azure), Security, Kubernetes, and Machine Learning.

Jeremy holds professional certifications for AWS, GCP, Terraform, Kubernetes (CKA, CKAD, CKS).