1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Data Visualization: How to Convey your Data

Visualizing Data Composition

Contents

keyboard_tab

The course is part of these learning paths

Introduction to Data Visualization
course-steps
2
certification
1
lab-steps
1
play-arrow
Start course
Overview
DifficultyBeginner
Duration32m
Students60
Ratings
5/5
starstarstarstarstar

Description

This course explores how to interpret your data allowing you to effectively decide which chart type you should use to visualize and convey your data analytics. Using the correct visualization techniques allows you to gain the most from your data. In this course, we will look at the importance of data visualization, and then move onto the relationships, comparisons, distribution, and composition of data.

If you have any feedback relating to this course, feel free to get in touch with us at support@cloudacademy.com.

Learning Objectives

  • Get an overview of what data visualization is and why it's important
  • Learn how to visualize relationships within data
  • Learn about comparisons, distribution, and composition of data

Intended Audience

This course has been designed for those who work with big data or data analytics who need to interpret data results in an effective way.

Prerequisites

As a prerequisite to this course, you should have a very basic understanding of the terminology used in relation to tables and graphs

Transcript

Hello and welcome to this lecture covering how to visualize data composition.  During this lecture we will be looking at the following chart types:

  • Pie chart
  • Stacked column chart
  • 100% stacked column chart
  • Tree map

For those unfamiliar with what is meant by visualizing data composition, it’s the method of presenting a part-to-whole relationship of a data set, perhaps the most common method of doing this is with a pie chart, which most people will be familiar with.

Although the pie chart is perhaps the most common way of doing this, the type of data composition chart you use will ultimately depend on the data set that you need to display. For example, using a data set that may have 20 or more different entries might not be the most appropriate for the use with a pie chart.  

For example, here is a table showing the popularity of manufacturers of cars across a single department in an organization.

As a pie chart, this is easily represented as shown.

It’s very easy to see the clear distinctions between the data and which is the most popular.  Adding the percentage values in each slice of the pie chart also helps to quickly re-inforce in more granular detail how each value contributes to the composition of the entire data set.

However, if we were to carry out this survey across the whole company rather than just a single department, we might have a data set like this which contains a far higher number of data sets.

Now if you were to use a pie chart to represent this data you will clearly see that it doesn’t provide as much clarity as it is trying to show too much information.

So although pie charts are a great way to show the composition of data, they are best used when there are only a few data sets involved, when you start to see data sets of 10-15 or upwards, the pie chart quickly loses its benefits.

Let’s now take a look at the stacked column chart to see when and why you might use this type of chart over a pie.  

Stacked column charts are used when you need to present data composition across a time-series. To help us to understand this better, let’s look at another example data set.  Let’s stick with the car theme, but this time this table shows the number of sales from a 2nd hand car dealership over a 6-month time period.

Taking this time-series data we can turn it into a stacked column chart.

We have already discussed the column chart which obviously looks very similar, but instead of each car make being shown as a separate column on each month…

….the values have instead been stacked on top of each other for that month.  This visually gives a much better representation of the part-to-whole relationship of one car manufacturer to others in a single month.

Because the values are stacked, it allows you to visually show a larger data set than you normally would with just a column chart with far more clarity.  

There is also a variation of the stacked column chart, and this is the 100% stacked column chart. The difference here is that the data set for each month will be converted to a percentage of the total value.  So the y-axis will become a percentage value instead of the number of sales for each car manufacturer.   Using the same data set, a 100% stacked column looks like this:

This shows exactly the same data, but represents each value as a percentage, so effectively the percentage of each car manufacturer sales in each month against the rest, again, showing another way of the part-to-whole visualization across the time-series data.

As you’d expect, you can also use a stacked bar chart or a 100% stacked bar chart, where instead of the data being shown vertically it will be represented by horizontal bars instead.  

Let’s now move onto the final chart type I want to discuss in this lecture, the Tree Map.

A tree map is most effective when you are trying to show and visualize hierarchical data from your data set using a series of rectangles which can be nested to show the composition of your data which demonstrates the part-to-whole relationships.  

Looking at an example, this will all become clearer.  

Again, sticking with our car examples, here we have a table showing the number of car sales from a dealership based on different car manufacturers and models:

We have a hierarchy of data, leading from the car manufacturer, to the model relating to those manufacturers and then the number of those vehicles sold.  

Using a Tree map we can visualize this hierarchical data set as seen here.

This is a very different take on most other charts we have looked at in this course.  The entire tree map reflects 100% of the data.  The different colours depict the ‘parent group’ which in this case is the manufacturer, this is then divided into smaller rectangles based on each of the models of that manufacturer.  The size of these rectangles represents the part-to-whole relationship of the entire data set. 

So very easily, by color alone we can see that Audi has been the most successful manufacturer for the car dealership from a number of sales perspective, followed by Ford, then BMW, and finally Dodge.

The largest values in parent are placed top left, with the smallest values bottom right.  So for example looking at Audi, we can quickly see that the A3 has been sold the most, whereas the Q7 has been sold the least out of the Audi’s.

This pattern is reflected throughout the entire tree map, Audi being top left and Dodge being bottom right based on the number of sales.

So Tree Maps are a great way to visualize a data composition when you have a hierarchy of data sets to display.

Lectures

Course Introduction - The Importance of Data Visualization - Visualizing Data Relationships - Visualizing Data Comparisons - Visualizing Data Distribution - Course Summary

About the Author
Students113956
Labs1
Courses95
Learning paths63

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 90+ courses relating to Cloud reaching over 100,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.