Data Visualization: How to Convey your Data
The course is part of these learning paths
This course explores how to interpret your data allowing you to effectively decide which chart type you should use to visualize and convey your data analytics. Using the correct visualization techniques allows you to gain the most from your data. In this course, we will look at the importance of data visualization, and then move onto the relationships, comparisons, distribution, and composition of data.
If you have any feedback relating to this course, feel free to get in touch with us at firstname.lastname@example.org.
- Get an overview of what data visualization is and why it's important
- Learn how to visualize relationships within data
- Learn about comparisons, distribution, and composition of data
This course has been designed for those who work with big data or data analytics who need to interpret data results in an effective way.
As a prerequisite to this course, you should have a very basic understanding of the terminology used in relation to tables and graphs
Hello and welcome to this final lecture of this course where I want to summarize some of the key points made throughout the previous lectures.
I started off by looking at some of the benefits of using different visualization techniques to show your data, these included the following.
- It becomes a fast and effective way for humans to process large amounts of data.
- It helps to drive business decisions in a very strategic way.
- It allows you to Identify important trends within your data.
- It can highlight data relationships quickly and easily.
- It allows the business to increase its use of statistical information.
We also look at the difference between charts and graphs.
- A chart itself displays information using tables, diagrams, and indeed graphs.
- A graph presents data in a visual mathematical format, usually along 2 dimensions, allowing you to see a visual correlation of the data.
I also explained that charts can be used to show Data relationships, Data comparisons, Data distributions, and Data compositions.
And this led me onto the next series of lectures covering each of these topics.
So in the next lecture, I touched on how to visualize data relationships in charts, and in this lecture, I covered the following relating to scatter charts.
- The main purpose of a scatter chart is to show the data relationship between 2 sets of data using an X and Y axis.
- They allow you to visualize a pattern or relationship between 2 the values in the x-axis against the values in the y-axis.
- Using trend lines you can add a plot line to project a prediction of other values.
- Trend lines allow you to work with both Interpolation and Extrapolation values.
- Interpolation values are those inside the boundaries of our existing data set.
- Extrapolation values are those that extend beyond the last data point of the trend line.
I then looked at Bubble charts.
- These are very closely related to scatter plots or scatter charts.
- They allow you to show the relationship between 3 data sets, and the 3rd set is represented by the size of the bubble on the chart.
- This 3rd dimension allows you to see additional and more complex data relationships.
- When creating your bubble charts you should add a level of transparency to your visual to avoid smaller bubbles being hidden skewing results.
- You should scale your bubble size appropriately to see maximum results.
Following the data relationships lecture, I then looked at different methods of how to visualize data comparisons where I looked at Bar, Column, and Line Charts.
- Bar charts are represented by a bar that goes horizontally, whereas a column chart has it’s bars shown vertically.
- Bar and column charts allow for a very easy and effective method of comparing different data sets.
- A bar chart can be preferable over a column chart if you have longer data labels because generally there is less space on the x-axis.
- Bar charts tend to visually work better when working with a larger amount of data sets for improved readability.
- When working with both positive and negative data set values, a column chart is preferred.
- Bar and column charts allow you to see a comparison of data between multiple data sets using closely adjacent bars.
Following these charts, I then covered line charts.
- The Line chart is sometimes also referred to as a line plot or a line graph.
- Values of the data sets are used as markers on the chart which are then connected via a continuous line across all data points.
- Line graphs are best used for data that contain a lot of values across multiple data sets.
- The x-axis is used to determine a solid variable, usually a measurement of time.
- They provide a great way to visualize a change in a single variable.
- They offer a clear and defined view to see trend lines over time.
In the next lecture, I moved onto how to present the visualization of data distribution through the use of histograms, and in this lecture I covered the following points:
- A data distribution shows us all the values of our data set and how often each value occurs.
- Histograms allow you to easily visualize your data that could contain thousands of data sets.
- A data distribution frequency is used to create a histogram.
- Using a frequency distribution you can see how data is distributed or grouped across the a data set.
- The range of a histogram is defined by subtracting the smallest value from the largest value, and this becomes the range.
- Histogram classes can be considered ‘intervals’ of data entries. Each class will have a count of the number of values that fit within that interval.
- To get your class number you need to take your range and divide it by how many classes you would like for your data set, and generally round up to the nearest whole number, and this gives your class width.
- Your frequency is a numerical value of how many values from your data set fit into each of your classes.
- A histogram is effectively the same as a bar or column chart, but it shows values based on a data distribution frequency.
- Histograms offer the benefit of being able to present huge data sets in a simplistic and readable manner through the means of distributing the data values using frequencies.
In the final lecture, I looked at how to visualize data composition using a variety of different graphs, including Pie chart, Stacked column chart, 100% stacked column chart, and Tree map.
Here we learned that:
- Data composition is the method of presenting a part-to-whole relationship of a data set
- The pie chart is perhaps the most common way of showing this.
- The type of data composition chart used depends on the data set that you need to display.
- Adding percentage values in a pie chart helps to quickly reinforce how each value contributes to the composition of the entire data set.
- Pie charts are best used when there are only a few data sets involved 3-10, above this, the pie charts starts to lose its visual clarity.
- Stacked column charts are used when you need to present data composition across a time-series and can handle larger data sets than that of a pie chart.
- Stacked column charts are similar to column charts but the values are stacked on top of each other for each time series, giving a much better representation of the part-to-whole relationship.
- Because the values are stacked, it allows you to visually show a larger data set than you normally would with just a column chart with far more clarity.
- In a 100% stacked column chart the y-axis becomes a percentage value showing another way of the part-to-whole visualization across the time-series data.
- A tree map is most effective when you are trying to show and visualize hierarchical data using a series of nested rectangles.
- The entire tree map reflects 100% of the data.
- Different colors can be used to depict the parent group of the data set.
- These colors can then be divided into smaller rectangles based on the next level down in the hierarchy, and the size of these rectangles represents the part-to-whole relationship of the entire data set.
- By color alone, you can see very easily the part-to-many relationships of the data set
- The largest values in parent are placed top left, with the smallest values bottom right.
- Tree Maps are a great way to visualize a data composition when you have a hierarchy of data sets to display.
That now brings me to the end of this lecture and to the end of this course, and so you should now have a greater understanding of why you would need to visualize your data, in addition to which method would be best depending on what data you are trying to present.
Feedback on our courses here at Cloud Academy is valuable to both us as trainers and any students looking to take the same course in the future. If you have any feedback, positive or negative, it would be greatly appreciated if you could contact email@example.com.
Thank you for your time and good luck with your continued learning of cloud computing. Thank you.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 90+ courses relating to Cloud reaching over 100,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.