The course is part of this learning path
This course explores data sources and formatting, and how to present data in a way that provides meaningful information. You'll look at data access patterns, and how different interfaces allow you to access the underlying information. This course also provides a practical, real-world example of how all this theory plays out in a business scenario. By the end of this course, you will have a good foundational understanding of how to wrangle and visualize data.
If you have any feedback relating to this course, feel free to reach out to us at firstname.lastname@example.org.
- Understand the difference between data and information
- Learn how to make data useful in order to gain insights from it
- Learn how to store data correctly
- Understand how these techniques can be applied in the business world
This course is ideal for anyone who is required to interpret or understand data for reporting purposes or for use in machine learning initiatives.
To get the most out of this course, you should be familiar with relational databases such as SQL or NoSQL and some common data formats such as CSV and JSON.
Now to dive right in, one of the most important distinctions, that's the cornerstone of this entire class, is the difference between data and information. Data when we refer to it, is really raw facts. This is individual numbers and texts, without any context, it could be a string of numbers, it could be free text. Very importantly, data exists without context, and is purely what is recorded.
In order for data to mean anything, however, we need to process it. What this means is apply some type of understanding, transformation, or interpretation. Typical processing would be a subject matter expert looking at it, or maybe some machine learning is putting an understanding to it. Or maybe it's simply going through a rules engine, in order to transform it into something meaningful, and add context and surrounding metadata. And once this processing and combining happens, you're left with information.
This is refined data, that allows you to get context to the data. It provides insights, and is way more valuable than raw data alone. To illustrate this kind of abstract process, let's consider the following number. 32509. What does this piece of data represent? Without context or other information, it's impossible to tell. Perhaps, it's an important date, such as March 25th, 2009, such as the date of your marriage, or the date you started a job, or perhaps it is $32,509, and is referencing an employee's starting salary. Or maybe if it's in a geographic context, it's the zip code of Pensacola, Florida. The important point here is that without processing, and other contexts added, added to it, raw data is pretty much meaningless.
Additionally, on a slightly more nuanced note, raw data is also not a great representation, when trying to make judgements. Look at the numbers on screen. It's a series of decimals between 6.34 and 7.06. Are they increasing with a regular interval? It's really hard to tell, just from the wrong information. However, when we start to process it, and put it in a more readable fashion, you can start to see how regularly it increases and where it speeds up and slows down. So simply adding context and simply processing the data, allows you not only to understand what it's trying to tell you, but it also makes driving key insights from that data easier.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.