Images and Sounds in Jupyter
Start course
1h 5m

Learn the ways in which data comes in many forms and formats with the second course in the Data and Machine Learning series.

Traditionally, machine learning has worked really well with structured data but is not as efficient in solving problems with unstructured data. Deep learning works very well with both structured and unstructured data, and it has had successes in fields like translation, and image classification, and many others. Learn and study how to explain the reasons deep learning is so popular. With many different data types, learn about its different formats, and we'll analyze the vital libraries that allow us to explore and organize data. 

This course is made up of 8 lectures, accompanied by 5 engaging exercises along with their solutions. This course is part of the Data and Machine Learning learning paths from Cloud Academy.

 Learning Objectives

  • Learn and understand the functions of machine learning when confronted with structured and unstructured data
  • Be able to explain the importance of deep learning



The Github repo for this course, including code and datasets, can be found here.


Hello and welcome back. In this video we're going to see how to deal with some type of unstructured data and rich data formats like images and sound. Let's start with the images. Python offers a library called Pillow for Python three or PIL, which stands for Python Image Library, from which we import the image class. Then we can use the image class to open any file in our data. For example, here we are opening the iss.jpg image. Notice that IPython Notebook is smart, and so when we ask to plot, when we ask to display the output of the image variable, it displays the image already. Pretty nice picture of an astronaut, just outside the International Space Station. Notice that I can ask for the type of the image variable, and it tells me that it's a jpeg image file. Cool. So, let's transform this image into an array. We using the numpy function as array. Now if I check the type of the image again, I obtain that the image array is of type numpy ndarray, which is what I wanted. 

An ndarray is a structure that we will use many times in this course, and it comes with a shape. So the shape is the number of pixels that the image has in the rows, the number of columns, and this is the number of color channels that we have, so since it's a red, green, blue image, we have RGB and it's three colors. Notice that we can ravel, or flatten the image by looking, by using the ravel function, and this point, it's just a long long list of numbers. You can check that the product of these three numbers so 435 times 640 times three is exactly the number of dimensions we have if we flatten the image. Pretty cool. Let's talk about sound. Sound can be imported using scipy wavfile library, and we use wavefile read and also give a file name. So, we can load the file into two things, the rate and the sound itself. So, using the audio display plugin, we can display a sound, and we can also play it. 

Let's lower the volume a little bit and play it. As you can see, it's pretty cool that the IPython Notebook is an interactive environment that deals with images and sounds. Sounds, snd, is a long array of numbers. So if we display it, it's an array of integer numbers. We can plot it and you see that this shape is the shape of the wave form, representing this sound. More interestingly, we can plot the spectrogram of the sound using a technique called Fast Fourier transform. This image shows the frequencies present at different times in the sound. So in the beginning, we have high frequency and also low frequencies, but the high frequencies become less and less as the sound evolves over time. I hope I've convinced you that Python has libraries to deal with different type of rich data, sound and images in this case, and also that the IPython Notebook is a good environment to deal with this type of data to display them and explore them. So before we go, I'll teach you another little trick about IPython Notebook, which is the following. If you select a Cell and click Current Outputs Clear you can clear the output of a cell, and then you can rerun it if you want. So thank you for watching, and see you in the next video.

About the Author
Learning Paths

I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-­founder at Spire, a Y-Combinator-­backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.