Data Visualization: How to Convey your Data
The course is part of these learning paths
This course explores how to interpret your data allowing you to effectively decide which chart type you should use to visualize and convey your data analytics. Using the correct visualization techniques allows you to gain the most from your data. In this course, we will look at the importance of data visualization, and then move onto the relationships, comparisons, distribution, and composition of data.
If you have any feedback relating to this course, feel free to get in touch with us at firstname.lastname@example.org.
- Get an overview of what data visualization is and why it's important
- Learn how to visualize relationships within data
- Learn about comparisons, distribution, and composition of data
This course has been designed for those who work with big data or data analytics who need to interpret data results in an effective way.
As a prerequisite to this course, you should have a very basic understanding of the terminology used in relation to tables and graphs
Hello and welcome to this lecture looking at data comparison through visualization.
This lecture is going to looking at 3 different types of charts, these being Bar, Column, and Line Charts, so first up, the Bar and Column charts
These two charts are probably what many of us are most familiar with, they are very commonly used when trying to visualize data.
We have all seen them, it’s simply a chart that represents values based on the size of a bar, either horizontally (Bar) or Vertically (Column).
They allow for a very easy and effective method of comparing different data sets. The rectangle bars give a quick visualization as to which data set is larger or smaller than another across all the data sets and lets you establish a very simplistic view of which data set has the highest or smallest value.
So for example, let's suppose we asked 50 AWS engineers to vote on their favorite AWS service, and the results come back as shown in the following table.
If we were to create a Bar chart of these results it would look as shown.
However, using the same data and same table we can also generate a Column chart, which would be presented like this:
As you can see, the information is presented exactly the same, by using a rectangular bar to represent the values in the table, and it’s very easy to see that Amazon S3 is most voted and Amazon RDS is least voted.
However, there are some use cases, where using one of these chart types over the other would be better. Let’s take a look at some of them.
One obvious difference, and you can see this in the example that we just looked at, is that if you have long data labels, which in this case are the names of the AWS services, then it can look cluttered when using a Column Chart.
If you look at the data labels of the column chart compared to that of the bar chart, the bar chart looks much neater and enhances the readability of the data labels which is important when displaying data. This is simply because on the x-axis there is less space to space out the data labels, especially when you have a larger number of data sets to add along the x-axis.
Also, Bar charts tend to visually work better when working with a larger data set, let me show you an example. Let’s say we extend our AWS service voting to include a much larger number of engineers, and we now have the following table which I have sorted based on the ‘Total Votes’ column:
A column chart representing this table would look as shown here.
The data labels make it difficult to interpret the data quickly, however, if we compare the same data represented as a Bar chart we get the following results:
Straight away we can see how the readability has improved dramatically. Also, by sorting the data values we are able to see a very clear data comparison between the votes for each service and allows us to draw a clear definition of order across all the data sets.
The final point I want to make against when to use either a Bar and Column chart is when dealing with negative values. Sometimes there will be instances where you will have a data set that will include both positive and negative values, for example, this simple table here.
Using a column chart we are able to view the data as shown.
Here we can easily see the months that show a positive result against the months that show a negative result. Now let me contrast this against the same data as shown in a bar chart:
This also shows the negative against the positive, but when working with negative values, visually we are able to compare positive values against negative values easier when displayed vertically than when displayed horizontally. So when working with negative data sets, a column chart is preferred.
Let me now show you how a bar or column chart looks with multiple sets of data, so far we have only looked at comparing values of one data set, for example, in the previous scenario we saw the profit/loss over a given year, what if we wanted to see how that year compared to the previous? The following table shows profit from 2 years, 2019 and 2020
Using a column chart allows us to easily compare the values between each month as shown here:
Here you can clearly see a close comparison of data between multiple data sets, in this case across 2 different years of profit and loss data. Visually, we can distinguish which month had a greater result between the 2 years.
Let me now take a look at the last chart in this lecture, the line chart.
The Line chart is sometimes also referred to as a line plot or a line graph. Instead of using rectangle blocks to represent data like the column and bar chart, instead, the values of the data sets are used as markers to define points to connect a continuous line across all data points.
Line graphs are best used for data that contain a lot of values, where the x-axis is used to determine a solid variable, usually, a measurement of time, which can be scaled as you need, for example, by millisecond, or by month or even decades, it all depends on the data itself. They provide a great way to visualize a change in a single variable, and like we discussed previously, you can use multiple data sets to help you compare those data sets with visual ease.
So let’s take a look at a new example set of data. This time our data set shows someone's beats per minute over a 60 minutes time period. Each data set shows the measurement taken during a different activity.
As a line graph, we can display this data as shown.
As you can see, this comparison between the different activities is clearly defined and allows you to see definite trend lines over time between the activities. Trying to encapsulate this amount of data across 4 different data sets over 30 different data points within a bar chart would look very cumbersome, as we can see here which shows the same data as we just presented in the line chart.
It’s much harder to see and to understand the data comparison between the different data sets.
That now brings me to the end of this lecture, in the next lecture I shall be looking at how to visualize the distribution of data sets, so let’s take a look!
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 90+ courses relating to Cloud reaching over 100,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.