Exercise 2: Solution
The course is part of these learning paths
Learn the ways in which data comes in many forms and formats with the second course in the Data and Machine Learning series.
Traditionally, machine learning has worked really well with structured data but is not as efficient in solving problems with unstructured data. Deep learning works very well with both structured and unstructured data, and it has had successes in fields like translation, and image classification, and many others. Learn and study how to explain the reasons deep learning is so popular. With many different data types, learn about its different formats, and we'll analyze the vital libraries that allow us to explore and organize data.
This course is made up of 8 lectures, accompanied by 5 engaging exercises along with their solutions. This course is part of the Data and Machine Learning learning paths from Cloud Academy.
- Learn and understand the functions of machine learning when confronted with structured and unstructured data
- Be able to explain the importance of deep learning
- It would be recommended to complete the Introduction to Data and Machine Learning course, before starting.
The Github repo for this course, including code and datasets, can be found here.
Hey guys. Welcome back. Let's go to exercise two. So the exercise two was requesting to load the weight-height csv. So we do that. We did that with csv and we see that there are three columns this time: One is a string, and two are floats. as we can verify using the df.info. Also notice that there are 10,000 points, zero to 9,999, and no null values. See there are two float columns and one string column. Let's check the stats of the height and the weight. So, the mean height is 66 inches, and the mean weight is 161 pounds. Also, let's see how many males and females there are in the data set. There are 5,000 males, and 5,000 females. We used the value counts to do that. Plot it using a scatter plot with weight as a function of height. This should have been easy. It's plot, what kind? Scatter. And we want weight to be on the y axis and height to be on the x axis. Okay. Interesting. So there is an actual correlation between weight and height.
And that's the fact that taller people are, on average, also heavier. The next step is to plot the male and female population in different colors. This can be done in several ways, showing a few here: One way is to create two new data frames, one of the males and one of the females. And then, plot each data frame on the same axis, using the data frame.plot directive with scatter as we did here. So, if we do this, we generate the requested plot. Notice that I've set alpha equals thirty percent, so that the points are a little transparent and you can see through a little bit. And this makes it look better. Notice that I've set the title on the first plot. But I could have set it on the second plot as well. And remember to label your axes. Now, a different way of doing this, is to define a new column called gender column. Let's do this in steps. So, if I define new column called gender column, and check the head of the data frame. See I've added a new column that is blue for male, and red for female. Now I can do a scatter plot by assigning the color to be gender color. And in this case I don't need to create two plots, because my data is automatically colored, using the color from the gender color column.
Finally, I can also do it with matplotlib, by creating a plot of the males' height versus the male weight in blue, and of the female height versus the female weight in red. This is kind of like the manual way of doing things. I manually set the height and weight labels. And I manually set the title. Plot is still the same. So I showed you three ways of solving exercise two. Thank you for watching. And see you in the next video.
I am a Data Science consultant and trainer. With Catalit I help companies acquire skills and knowledge in data science and harness machine learning and deep learning to reach their goals. With Data Weekends I train people in machine learning, deep learning and big data analytics. I served as lead instructor in Data Science at General Assembly and The Data Incubator and I was Chief Data Officer and co-founder at Spire, a Y-Combinator-backed startup that invented the first consumer wearable device capable of continuously tracking respiration and activity. I earned a joint PhD in biophysics at University of Padua and Université de Paris VI and graduated from Singularity University summer program of 2011.