1. Home
  2. Training Library
  3. Machine Learning
  4. Courses
  5. Building Machine Learning Pipelines with scikit-learn - Part One

Conclusion

Contents

keyboard_tab
Machine Learning Pipelines with Scikit-Learn
1
Introduction
PREVIEW5m 42s
2
Scaling Data: Part 1
PREVIEW12m 18s
6

The course is part of this learning path

Start course
Overview
Difficulty
Intermediate
Duration
53m
Students
153
Ratings
5/5
starstarstarstarstar
Description

This course is the first in a two-part series that covers how to build machine learning pipelines using scikit-learn, a library for the Python programming language. This is a hands-on course containing demonstrations that you can follow along with to build your own machine learning models.

Learning Objectives

  • Understand the different preprocessing methods in scikit-learn
  • Perform preprocessing in a machine learning pipeline
  • Understand the importance of preprocessing
  • Understand the pros and cons of transforming original data into a machine learning pipeline
  • Deal with categorical variables inside a pipeline
  • Manage the imputation of missing values

Intended Audience

This course is intended for anyone interested in machine learning with Python.

Prerequisites

To get the most out of this course, you should be familiar with Python, as well as with the basics of machine learning. It's recommended that you take our Introduction to Machine Learning Concepts course before taking this one.

Resources

The resources related to this course can be found in the following GitHub repo: https://github.com/cloudacademy/ca-machine-learning-with-scikit-learn

Transcript

Congratulations, you've reached the end of this course. We've gone through quite a few things here, so let's just have a quick recap to look at what you've learned. We covered the family of transformers that are, typically, used to pre-process data, and therefore they're used in a machine learning pipeline before the fit of a model.

We learned how to apply the Standard Scaler, which is quite a benchmark for performing pre-processing with heterogeneous variables. We covered different techniques, such as the Robust Scaler, and understood when those techniques should be used. We applied techniques that are useful for encoding categorical variables into dummy variables, and understood how to select the columns to encode via make column transformers.

Finally, we covered the imputers class, which can be used to impute missing values in the data. In particular, we covered the simple univariate imputer and two multivariate techniques, namely the iterative imputer and the KNN imputer. Those two multivariate methods should be used when you assume there might be some dependence among variables in our dataframe. I hope you enjoyed this course and found it useful. If you have any feedback on it at all, please feel free to reach out to us at support@cloudacademy.com. Thanks for watching.

About the Author
Avatar
Andrea Giussani
Data Scientist
Students
1343
Labs
11
Courses
7
Learning Paths
2

Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.

He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.