DifficultyIntermediate
AVG Duration6h
Students653
Ratings
5/5
starstarstarstarstar
Content
3223

Description

This learning path explores a range of techniques for building machine learning pipelines with Python. You'll learn how to carry out preprocessing techniques with the Python library scikit-learn and then build a machine learning model from it. We'll also cover PyTorch, another machine learning library in Python, and you'll understand how to create tensors and datasets, as well as how to use the autograd module and backpropagation.

You'll then put your newly acquired skills to the test with two lab challenges that focus on the data preprocessing, fitting, and evaluation of both a classification model and a regression model.

Learning Objectives

  • Learn how to carry out the different preprocessing methods in scikit-learn
  • Understand the pros and cons of transforming original data into a machine learning pipeline
  • Deal with categorical variables inside a pipeline
  • Manage the imputation of missing values
  • Explore supervised-learning techniques used to train a model in scikit-learn by using a simple regression model
  • Understand the concept of the bias-variance trade-off and regularized ML models
  • Explore linear models for classification and how to evaluate them 
  • Create a tensor in PyTorch
  • Understand when to use the autograd attribute
  • Create a dataset in PyTorch
  • Understand what backpropagation is and why it is important

Intended Audience

This course is intended for anyone interested in machine learning, and especially for data scientists and data engineers.

Prerequisites

To get the most out of this course, you should be familiar with Python, as well as with the basics of machine learning. It's recommended that you take our Introduction to Machine Learning Concepts course before taking this one. To follow along with the PyTorch 101 course, you should have PyTorch version 1.5 or later.

Certificate

Your certificate for this learning path

Training Content

1
Course - Intermediate - 53m
Building Machine Learning Pipelines with scikit-learn - Part One
This course is the first in a two-part series that covers how to build machine learning pipelines using scikit-learn, a library for the Python programming language.
2
Exam - 30m
Knowledge Check: Building Machine Learning Pipelines with scikit-learn - Part One
Knowledge Check: Building Machine Learning Pipelines with scikit-learn - Part One
3
Course - Intermediate - 1h 9m
Building Machine Learning Pipelines with scikit-learn - Part Two
This course is the first in a two-part series that covers how to build machine learning pipelines using scikit-learn, a library for the Python programming language.
4
Exam - 35m
Knowledge Check: Building Machine Learning Pipelines with scikit-learn - Part Two
Knowledge Check: Building Machine Learning Pipelines with scikit-learn - Part Two
5
Hands-on Lab - Beginner - 1h
Machine Learning with scikit-learn
The aim of this lab is to challenge you on building a supervised machine learning pipeline to predict the median values of owner-occupied housing in USD 1000 in the Boston dataset.
6
Course - Intermediate - 1h 1m
PyTorch 101
This course introduces you to PyTorch and focuses on two main concepts: PyTorch tensors and the autograd module.
7
Hands-on Lab - Beginner - 1h
PySpark - Preprocessing
In this lab, you will learn how to create a dataset using the PySpark library, and to manipulate it using standard filtering and slicing techniques.
8
Hands-on Lab - Beginner - 1h
PySpark - How to build a Machine Learning Pipeline
In this hands-on lab, you will master your knowledge of PySpark, a very popular Python library for big data analysis and modeling.
9
Hands-on Lab Challenge - Advanced - 1h 30m
Machine Learning Python Challenge: Regression
In this lab challenge, you will be tested on your scikit-learn skills to build a machine learning pipeline to predict the price of a stock
10
Hands-on Lab Challenge - Advanced - 1h
Machine Learning Python Challenge: Classification
The aim of this lab is to challenge you on building a supervised machine learning pipeline to predict the probability that a subject will suffer from a heart stroke.
About the Author
Students3775
Labs13
Courses8
Learning paths4

Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.

He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.