Convolutional Layers
Start course
1h 19m

In this course, discover convolutions and the convolutional neural networks involved in Data and Machine Learning. Introducing the concept of tensor, which is essential for everything that follows.

Learn to apply the right kind of data such as images. Images store their information in pixels, but you will discover that it is not the value of each pixel that matters.

Learning Objectives

  • Understand how convolutional neural networks are essential to the fundamentals of Data and Machine Learning.

Intended Audience


Hello and welcome to this video, on convolutional layers. In this video we will talk about convolutional layers, the convolution of tensors, strides, and padding. If we convolve the same image with different filters, we obtain different convolved images, and each of these image represents the locations of the matches with the corresponding filter. We can take many filters and arrange them in a convolutional layer and see that, when an image is fed to a convolutional layer, the output is also a stack of convolved images. Since we know tensors and all of these images have the same size, we can arrange them in a tensor, where the number of channels corresponds to the number of filters used. Filters too can be arranged in a tensor. Which is going to be the tensor of weights, in the convolutional layer. Let's look at this in more detail. Our input data is an order four tensor. Yes it's an order four, not an order three tensor. Because a single image is an order three tensor as we know, because it has height, width, and channel, or color. But in this case we have many input images in a batch, and since we have many, we might as well stack them in an order four tensor. 

Where the first axis indicates the number of samples. So the four axis are respectively, the number of images in the batch, the height of the image, the width of the image, and the number of color channels in the image. For example in the MNIST training data set, we have 60000 images, each with 28 by 28 pixels, and only one color channel, because they're a gray scale. This gives us an order four tensor, with the shape of 60000, 28, 28, one. Similarly, we can stack the filters in the convolutional layer as an order four tensor. We will use the first two axis for the height and the weight of the filter batch. The third axis will correspond to the number of color channels in the input. 

While the last axis is for the number of channels in the output. That is the number of filters in the layer. In the example above, our convolutional layer has a shape of three, three, one, two. Meaning we have two batches, or two filters, each is three by three pixels, and each has a single input color channel. When we convolve the input image, with the convolutional layer, we still obtain an order four tensor. The first axis in this tensor, is the number of images in the batch. In the case we left it blank to indicate that the number could vary with batch size. The other three axis in the input tensor, are as we say, the image height, the width, and the number of color channels. 

The first two axis of the convolutional layer, are the height and the width of the filter batch. The third axis in the convolutional layer, must have the same size as the number of colors in the input images. The fourth axis in the convolutional layer, must be equal to the number of output channels we want to obtain after the convolution. This is also the number of filters we are going to learn in the layer. Finally the second and third axis in the output, will be the height and the width of the convolved image. Notice that since the output is an order for tensor, we can fit it again into a new convolutional layer, provided we make sure to match the number of channels correctly. One thing you may have noticed is that the convolved images are smaller at each convolution. This is controlled by a parameter called stride. 

The stride is the number of pixels that we use to shift our convolution. A stride of one, means we slide our window by one pixel horizontally, and one vertically. In this case the convolved image will have seven by seven pixels. If we stride by two-two, there are only four patches available in each direction, and therefor our output image will be four by four. If we stride by three-three, the output image will be a three by three. You get the idea I guess. One thing to mention is we can also stride of different lengths in the two directions, which will produce a rectangular image at the output. Finally if we don't want to lose the borders during the convolution we can pad the image with zeros, and obtain again a nine by nine convolved image. This is called a convolution that preserves the sides, and is indicated by the word same. In this video we've convolved images with many filters at once, and showed how they can all be stored in tensors. We've learned about convolutional layers. We've talked about strides and padding. So I hope this was interesting, and it's the core of how convolutional neural networks work. So thank you for watching, and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.