Data Analytic Concepts
The course is part of this learning path
This course explores the different concepts behind data analytics. It will provide you with a clearer understanding of what data analytics actually is and how it allows you to collate, store, review, and analyze data to help drive business decisions through insights that have been identified.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
By the end of this course, you will have an understanding of:
- Different analytic concepts
- Data types, including structured, semi-structured, and unstructured data
- When you should use data analytics within your business
- The process behind running analytics against data
This course is ideal for those looking to become data scientists or solutions architects. Also, if you are studying for the AWS Data Analytics - Specialty certification, then this course would act as a great introduction to the topic itself.
As this is a beginner's course, all concepts will be explained throughout the course. Any knowledge of AWS data analytic services would be advantageous, but not essential.
First of all, when you have a problem you usually define it as a question to start your journey into the analytics field. With your question ready, you need the source data, which is effectively your starting point. This can be a data warehouse database, relational database tables, or a NoSQL store, a CSV file, books, text files, in short, every readable format can be used as an input. Selecting the input will depend on the answers you are trying to return from your problem or question
For example, the problem might be to count words in a book from Shakespeare, or on the other end of the scale, the problem might be to analyze DNA to find patterns. So the type of problem will dictate the data and also the processing algorithm.
With your input ready, you need to store it in an accessible place for the tools to then process it, analyze, and return the results. The separation using process and analysis is based on the fact that some analytics solutions from AWS will require a previous cleaning or pre-processing from the data for better results and accuracy.
AWS has structured its portfolio around the collect, store, analyze, and visualize methodology for each step with integrated services to perform each function.
The first step in your analytics process is to collect the data that you want to use as an input. The data collection is also called ingestion which is the act of acquiring data and storing it for later usage. In the data collection, we have different types of ingested data. We can have transactional data which is represented by traditional relational databases, reads, and writes. We can also have file ingestion reading data from file sources such like logs, texts, CSV files, book contents, and so on, and you can have also streamed data represented by any kind of streamed content like a clickstream and events on a website, internet of things devices, and so on.
The toolset AWS currently offers can ingest data from many different sources. For example, with Kinesis Streams or Firehose, we can work easily with streamed data on any source, even if they are on-premises.
After the data is generated or acquired, we need to store it in an accessible place for AWS. This is usually called a data lake. The big pool where your services go to get the source and to deliver back the results. Amazon S3 is one of the core storage services from AWS. As a highly durable object store integrated seamlessly with all other AWS analytic services for data loading and store. You can also have data on Amazon RDS if the data has a structured format, or on Redshift, if it's there for data warehousing BI. If it has no fixed data model, but a basic structure, we can use DynamoDB, the NoSQL solution from AWS to store it. And if your data has very infrequent access we can use Glacier, the archive service from AWS.
Remember the right service or tool depends on the type of problem you have and the velocity of your replies. If you can wait for a while or if you need real-time answers to the problems, and if you want to predict future behaviors.
If your goal is to provide reports based on batch processing, historical data, or just identify patterns on large data sets without the need of real-time answers, you can take advantage of batch analysis through EMR, the Amazon Elastic MapReduce Service based on the proven Hadoop framework.
If you need real-time replies for questions or the results must be displayed on live dashboards, then you might take advantage of stream-based processing with Amazon Kinesis, AWS Lambda, or Amazon OpenSearch. Kinesis provides streams to load, store, and easily consume live stream data and AWS Lambda can react to these streams events, executing functions you define.
Amazon OpenSearch can index and give you insights about the data, allowing you to query on its flexible query language, or display using Amazon Quicksight
For predictive analytics where you need to forecast an event based on historical cases, you might take advantage of Amazon Machine Learning services to build highly available predicting applications. Not forgetting Data Pipeline, which can be used to orchestrate all these services.
As a framework for data-driven workflows, Data Pipeline can be used to automate your database loads. And for the visualization aspect to get a nice overview and dashboard from your replies you can use Amazon QuickSight, which allows you to create rich visualization from your data.
As we said in the beginning, this is just an overview for some Data analytics concepts with a mention of some of the AWS services used with data analytics
That now brings me to the end of this introductory course and you should now have a greater understanding of the concepts behind Data Analytics.
If you have any feedback, positive or negative, please do contact us at email@example.com. Your feedback is greatly appreciated. Thank you for your time and good luck with your continued learning of cloud computing.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.