The course is part of this learning path
This course explores data sources and formatting, and how to present data in a way that provides meaningful information. You'll look at data access patterns, and how different interfaces allow you to access the underlying information. This course also provides a practical, real-world example of how all this theory plays out in a business scenario. By the end of this course, you will have a good foundational understanding of how to wrangle and visualize data.
If you have any feedback relating to this course, feel free to reach out to us at firstname.lastname@example.org.
- Understand the difference between data and information
- Learn how to make data useful in order to gain insights from it
- Learn how to store data correctly
- Understand how these techniques can be applied in the business world
This course is ideal for anyone who is required to interpret or understand data for reporting purposes or for use in machine learning initiatives.
To get the most out of this course, you should be familiar with relational databases such as SQL or NoSQL and some common data formats such as CSV and JSON.
So let's talk about what actually happens during this processing step. How do we make data useful? So one of the first things we need to do, especially when creating it for a visual medium, such as dashboards and reporting is to make it human friendly. Making data human friendly requires a good understanding of the subject matter and an understanding of where the data will be presented.
For example, in the United States, currency values are typically displayed with a dollar sign to the left of it and a decimal to the right of it. While in parts of Canada in the European Union, the currency symbol goes to the right. Another classic mix-up is, does the day or month go first when writing a date? So when making data human friendly and including these processing steps in your data pipeline, think about where it's going to be read and what the standards are.
A second major thing we can do when processing data is to aggregate or enrich it. Combining different pieces of information from different data sources often provides a whole, which is more meaningful. You might see the fancy word gestalt in the textbook. Nobody uses that in practice. They say combine, enrich, and aggregate. A common example of this is if you have a database with zip code values is to combine it with another database that has a map between zip codes and county names or town names, depending what region you're in. And that aggregate where you now have human readable county or city names is more useful than just raw zip codes.
Organizing data is also a great way to help people understand how it's being displayed. If you've attended the other data engineering courses, and I recommend that you do, we discuss creating a good data model being key to good, sustainable, usable information generation. So processing your data into something that makes a lot of sense and creating information out of good organization is something that if you start from the beginning, this processing step gets a lot easier.
And lastly, and most importantly, the best processing step is finding patterns and trends. This is often the ideal end state of any processing pipeline, when turning data into information. This step pulls from all the other steps and helps you uncover meaningful insights into your business or data set. It's a little outside of the scope of this class to discuss all the ways you can find patterns and trends. Keep your eyes open for more machine learning and trend analysis classes. Cloud Academy has some and we'll be making more, but just know that at the end of all this processing, you're hoping to reveal hidden facts about the data and provide good insights.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.