Big Data - Data Visualization
In this course, we learn how to determine the appropriate techniques for delivering the results/output of a query or analysis. We examine how to design and create a visualization using AWS services, and how to optimize visualization services to present results in an effective and accessible manner. We introduce and outline the core AWS analysis tools and then work through how to integrate and output data to enable business decisions using QuickSight.
Amazon QuickSight makes it easy to build visualizations, perform ad hoc analysis, and quickly get business insides from your data. It has a number of pre-configured reports which take out the undifferentiated heavy lifting of creating visual reports. A benefit of QuickSight is that it's integrated into our AWS dashboard and our AWS account, and this is where reports and graphs can be viewed by team members, staff, etc.
QuickSight also makes it easy for business teams to create and share interactive graphs and reports as stories, and if we have any additional data sources added in the future, those can just simply be added as Amazon QuickSight database sources. When we create visuals using QuickSight, the style and format of graphs are automatically selected by the QuickSight engine, which saves time and improves the quality of reports and visuals.
- Recognize and explain how to determine the appropriate techniques for delivering the results/output of a query or analysis
- Recognize and explain how to design and create data visualizations
- Recognize and explain the operational characteristics to gain simple and timely results from Amazon QuickSight
- [Instructor] Let's start my introducing the visualization tools and services that we have available to us. AWS provides a suite of tools to help you present and manipulate big data results. Amazon Athena is an interactive query service that makes it easy to analyze data that's stored in Amazon S3 using standard sequel. Athena is server-less, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena uses Presto with ANSI SQL and works with a variety of standard data formats. They include CSV, JSON, ORC, Avro, and Parquet. With Athena there is no need for complex ETL jobs to prepare your data for analysis. Now this makes it easy for anyone with SQL skills to quickly analyze large scale data sets. Another benefit is that Athena uses Amazon S3 as it's underlying data store, so you're data remains durable and highly available. Amazon Athena is integrated with AWS Glue, so you can use the Glue's ETO capabilities to transform data or use the Glue data catalog, which is a powerful unification tool. Now you could look to use this to create a unified MIDI data repository, for example, that could run across a number of data services. Amazon QuickSite makes it easy to build visualizations, perform ad hoc analysis, and quickly get business insides from your data. It has a number of pre-configured reports which take out the undifferentiated heavy lifting of creating visual reports. Let's think through how QuickSite can be used. Data dashboarding is often a core requirement in business reporting, and a common-use case is business reporting on the data we might have stored in a data store. Let's envisage our data warehouse make up new data on a nightly basis from a number of different sources. So we need to ingest and transform that data quickly so the data is ready and consumable in the morning when the CEO and other business users come in and need to generate reports. So we might have a number of transformation jobs that put formatted and clean data into Amazon S3. Now we use Amazon S3 as our data store, as it is highly durable, and Amazon Redshift can consume this data on multiple threads in parallel from each Redshift node. So data processing will be really fast. It is also for data on Amazon S3 to be consumed by other analytics, tools, or services if we add them. For visualizing analytics, we can use Amazon QuickSite or one of the many partner visualization platforms listed in the marketplace using the ODBC/JDBC connection to Amazon Redshift. Now the benefit of QuickSite is that it's integrated into our AWS dashboard and our AWS account, and this is where reports and graphs can be viewed by the CEO and his staff. When we create visuals, the style and format of graphs is automatically selected by the QuickSite engine, which saves time and improves the quality of reports and visuals. QuickSite also makes it easy for business teams to create and share interactive graphs and reports as stories, and if we have any additional data sources added in the future, those can just simply be added as Amazon QuickSite database sources. Now once we are in QuickSite we can create visuals and scenes that provide information relevant to different business units or reporting agendas. QuickSite also enables us to share graphs, reports, or business insights as creative stories. A story is a collection of interactive visuals that can be easily shared with other people. Now at the heart of the QuickSite service is the Spice engine. So QuickSite uses Spice which stands for Super fast Parallel In-memory Calculation Engine, and Amazon has developed this to run natively in AWS. So it has been from the ground up for the AWS cloud. Now Spice uses a combination of data compression, columnar storage, machine code generation, and in-memory technologies enabled to the latest hardware innovations. Spice automatically replicates data for higher availability, and also enables Amazon QuickSite to support interactive analysis across a wide variety of AWS data sources. Now we don't need to be an expert in how it all works, but what is does mean is that we can run interactive queries on massive data sets and get really fast results. Spice capacity is allocated by region. So the information displayed is for the currently selected region you have. You can see how much Spice capacity you are using and how much there is overall from the AWS console. Currently each Amazon QuickSite account receives 10 gigabytes of Spice capacity per paid user, and that is allocated when you log into QuickSite for the first time. This limit will no doubt change over time so do check the current account limits. If space is a concern for your use cases. You get one free user per account, and the Spice capacity is pooled across all users for your Amazon QuickSite account. So each QuickSite account receives one gigabyte of Spice capacity. So if you have four users, say one free and three paid for, you'll have 31 gigabytes of Spice capacity available. Now that can be utilized by any of the users in the account. All of your default Spice capacity is allocated to your home region, and the other regions have no Spice capacity unless you choose to purchase some, okay? Now as your usage of QuickSite increases, housekeeping does become important, and you can release purchased spice capacity that you aren't using to free up capacity. To free up Spice capacity, you delete any unused data sets that you haven't ported into spice. Now keep in mind purchasing or releasing Spice capacity only affects the capacity for that currently selected region. You can purchase up to one terabyte of additional Spice capacity per QuickSite account if you need it. If you do find yourself low on Spice Capacity, you can also choose the buy Spice alert that appears on the your data sets and create a data set pages in the console, and if you need more capacity than that, you can submit a limit increase request to AWS support following the AWS service limits instructions. A neat benefit with QuickSite is that it's very easy to connect QuickSite to data sources. You can upload CSV or Excel files, ingest data from AWS data sources such as Amazon S3, Amazon Redshift, Amazon RDS, or Amazon Aurora, or Amazon Athena, and Amazon Elastic produce, which is Presto, and Apache, and Spark. Now you can also connect to cloud or on premise databases such as MySQL, Sequel Server, and Postgres, and you can also connect to SAS applications like Sales Force. You can prepare data in any data set to make it more suitable for analysis. You can change field names or add a calculated field. You can also do Joins on database tables using structured query language or SQL. You'll find it relatively limited if you need to do complex cascades or select into statements or on inner Joins. The Join interface doesn't let you use any additional SQL statements to refine the data set. A couple of points to remember, the target of the join has to be a Spice dataset for Joins. For both, datasets have to be based in the same sequel database data source, alright? So you cannot do Joins across two independent data sources. The fields used in the joins cannot be calculated fields. So if you've edited a date collect calculation or similar that can't be part of your Joins statement. If you do need to run a lot of conditional logic then you might want to consider using Amazon Athena as it provides more functional support and flexibility in manipulating data. You can use calculated fields to use common operators or functions to analyze or transform field data. You can use multiple functions and operators in a calculated field. So for example, you might use the format date function to extract the year from a date field and then the if else function to segment records based on that year. A calculated field has to be from the QuickSite data source, just FYI. A lot of common function types are supported as you can see from this visual on the screen. Now Amazon QuickSites supports assorted visualizations that facilitate different analytical approaches. To create a visualization you start by selecting the data fields you want to analyze or drag the fields directly onto the visual canvas. We can do a combination of both. QuickSite will automatically select an appropriate visualization to display your data based on the data that you've selected. Now it does this using a proprietary technology called Autograph. Autograph allows QuickSite to select the most appropriate visualizations based on the properties of the data such as cardinality and data type. Pretty clever. The visualization types are chosen to best reveal the data and the relationships in the most effective way. QuickSite is super-intuitive. It selects the display time that best suits the record types you have in your results set. Now this is a real time saver if you just need to show results quickly. You can alter the visualization type to include fields or views you prefer by adding a field from the field selector. If you add a new view or field it automatically is added to the field types menu which is super-cool. If you add an additional sort field, the graph visual style automatically updates from a bar graph to a line graph to better represent your results set. Brilliant! You can also resort back to the default visualization at any time by clicking the menu option. Okay, so let's walk through a demo. So we choose a data set and then we choose create analysis. If we don't have any data sets yet, we'll create a new one by choosing new data set. Now at this point you'll notice there's an auto-save option up there in the menu bar. Auto-save is on by default when you're working on an analysis. When it's on you're changes are automatically saved every minute or so. I'm not sure exactly what the timing is. When auto-save is off, your changes are not automatically saved, okay? So that's useful if you want to try out a different analysis or display style, or perhaps show a certain variation or view without changing your core analysis. The under feature works when either auto-save mode is on or off. So you can undo or redo any change you make by using undo or redo from the application bar.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.