Contents
PyTorch 101
This course introduces you to PyTorch and focuses on two main concepts: PyTorch tensors and the autograd module. We are going to get our hands dirty throughout the course, using a demo environment to explore the methodologies covered. We’ll look at the pros and cons of each method, and when they should be used.
Learning Objectives
- Create a tensor in PyTorch
- Understand when to use the autograd attribute
- Create a dataset in PyTorch
- Understand what backpropagation is and why it is important
Intended Audience
This course is intended for anyone interested in machine learning, and especially for data scientists and data engineers.
Prerequisites
To follow along with this course, you should have PyTorch version 1.5 or later.
Resources
The Python scripts used in this course can be found in the GitHub repo here: https://github.com/cloudacademy/ca-pytorch-101
Hello and welcome! My name is Andrea Giussani and I’m going to be your instructor for this course on PyTorch 101.
If you’ve ended up here, you have probably heard about PyTorch, which nowadays has become one of the most important machine learning frameworks for both developers and researchers. So in this lecture, we are going to learn what PyTorch is, and a little bit about its history.
This course is therefore intended for anyone interested in Machine Learning, and especially for Data scientists and Data engineers.
The objective of this course is to get you exposed to the PyTorch ecosystem, and in general to tensors and the autograd module.
By the end of this course, you will be able to
- Create a tensor in PyTorch;
- Understand when to use the autograd attribute;
- Create a dataset in PyTorch;
- Understand what backpropagation is, and why it is important.
In particular, PyTorch is an open-source Machine Learning library based on the Torch library, used for applications such as computer vision and Natural Language Processing, primarily developed by Facebook's AI Research lab. It provides several modules used for building complex neural networks, or in general, for tensor computing.
In this course, we are going to focus on two main concepts: PyTorch tensors and the autograd module.
We are going to get our hands dirty throughout the course, using a demo environment to explore the methodologies covered. We’ll look at the pros and cons of each method, and when they should be used.
So now a little bit of history. PyTorch was created at the end of 2016 as an internship project by Adam Paszke, a Senior Research Scientist at Google. The fun fact here is that his mentor was Soumith Chintala, a core developer of Torch, who gave him the inspiration and the motivation to go ahead with the PyTorch project.
In 2017, two other developers joined the project (Sam Gross and Gregory Chanan), and by the end of the year, the library became popular among deep learning researchers for its simplicity and flexibility.
Its popularity started to grow when Facebook decided to integrate two of its most popular AI model dev ecosystems, namely PyTorch and Caffe2, which took place at the end of 2018. This decision was driven by a problem: models defined by the two frameworks were mutually incompatible, which was not very helpful if you wanted to convert a model from one framework to another.
Now, here it is worth making two remarks: Caffe2 was launched in mid-2017 and was used instead of PyTorch to develop AI models on IOs, Android, and Raspberry Pi devices. But the integration of the two ecosystems was, at the time, necessary because, at the end of 2017, Facebook and Microsoft announced an open-source project called Open Neural Network Exchange (or ONNX), which was created to help researchers convert models between frameworks.
According to Wikipedia, PyTorch provides two high-level features:
- Tensor computing with strong acceleration via Graphics Processing Units (GPUs).
- Deep Neural Networks built on a type-based automatic differentiation system.
Needless to say, PyTorch is the deep learning ecosystem in Facebook, but several other companies have started using PyTorch for production applications. Among those, it is worth mentioning Tesla, Uber, Amazon, Spotify, and even CloudAcademy for our internal research and development.
Also, PyTorch is well integrated with the main cloud vendors: for example, AWS provides pre-built images on the PyTorch ecosystem, where you can develop and test your model. But this feature is also present in both Azure Machine Learning Studio and Google Cloud AutoML.
This course requires PyTorch version 1.5 or higher, and this library has good documentation, which is obviously available online. We will use the Google Colab environment to work with PyTorch. The reason for this is that Google gives you the possibility to work with a GPU for free in that environment, so it is pretty convenient for us.
If you are not familiar with it, I have written a small piece of documentation on how to set up it on your machine. Please check the GitHub repository for this course for more details.
In this course, we are going to cover the basics of PyTorch. The structure is as follows. In the next lecture, we are going to explore the concept of tensor. In Lecture 3, we will focus on the autograd module, which is a crucial part of PyTorch. In Lecture 4, we will explore the Dataset class, and finally, in Lecture 5, we will investigate the concept of backpropagation, and how it is implemented in PyTorch.
So, if you’re ready, let’s get started!
Andrea is a Data Scientist at Cloud Academy. He is passionate about statistical modeling and machine learning algorithms, especially for solving business tasks.
He holds a PhD in Statistics, and he has published in several peer-reviewed academic journals. He is also the author of the book Applied Machine Learning with Python.