1. Home
  2. Training Library
  3. Machine Learning
  4. Machine Learning Courses
  5. Introduction to Natural Language Processing with Scikit-learn

Conclusion

Contents

keyboard_tab
Start course
Overview
Difficulty
Intermediate
Duration
38m
Students
39
Ratings
5/5
starstarstarstarstar
Description

This course covers the basic techniques you need to know in order to fit a Natural Language Processing Machine Learning pipeline using scikit-learn, a machine learning library for Python.

Learning Objectives

  • Learn about the two main scikit-learn classes for natural language processing: CountVectorizer and TfidfVectorizer
  • Learn how to create Bag-of-Words (boW) representations and TF-IDF representations
  • Learn how to create a machine learning pipeline to classify BBC news articles into different categories

Intended Audience

This course is intended for anyone who wishes to understand how NLP works and, more particularly, how to implement it using scikit-learn.

Prerequisites

To get the most out of this course, you should already have an understanding of the Python programming language.

Transcript

Congratulations! You have reached the end of this course, where we introduced you to the basic techniques that you need to know in order to fit a Natural Language Processing Machine Learning pipeline using scikit-learn.

In this course, you have seen how to use two main scikit-learn classes for natural language processing: CountVectorizer and TfidfVectorizer, which are used to create a Bag-of-Words (boW) representation and a TF-IDF representation of a given corpus of texts, respectively.

You have learned that BoW is not as efficient as the TFIDF matrix, which also allows you to compare documents using the cosine similarity matrix. 

Then, we applied that knowledge to a machine learning pipeline that was created to classify BBC news articles into five different categories. We used the scikit-learn objects LabelEncoder and Pipeline, and we obtained some great results.

That now brings us to the end of the course and I hope you enjoyed it. If you have any feedback, questions, and comments, please feel free top contact us at support@cloudacademy.com. Thanks for watching!

About the Author
Students
3026
Labs
13
Courses
8
Learning Paths
4

Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.

He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.