Working With Data Sets
The course is part of this learning path
This course focuses on understanding common data formats and interfaces. It explores some common data formats that you'll encounter as a data engineer. Basically, the goal is to develop a deep understanding of what the pros and cons of storing your data in different ways is. We're then going to focus on how to translate that high-level ethereal concept into a more concrete understanding and really showcase how the same dataset can be accessed and viewed differently if you were to just simply store it in a different fashion.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Learn about different data sources and formats, and how to model your data
- Get acquainted with the common data formats — CSV, XLM, and JSON — as well as specialized data formats
- Learn about databases and how to exchange data between applications
This course is suited to anyone looking to gain a practical, hands-on understanding of data modeling and for those who might want to change how they're storing their data.
To get the most out of this course, you should familiarize yourself with the concepts of what a CSV and a JSON is, along with databases at a high level.
Hey, everyone and welcome to working with datasets, a class where we really focus on understanding common data formats and interfaces. In this course, we're going to discuss some common data formats that you'll encounter as a data engineer, whether you're an experienced one looking to brush up or someone just getting started. Basically, the goal is to develop a deep understanding of what the pros and cons of storing your data in different ways is. We're then going to focus on how to translate that high level ethereal concept into a more concrete understanding and really showcase how the same dataset can be accessed and viewed differently if you were to just simply store it in a different fashion.
Hopefully, by the end of this, everyone here will be extremely comfortable with the different data formats and have seen us work with innate solid hands-on practical fashion.
There are a few prerequisites, but nothing too intense. If you're going to sit through this class, we recommend you familiarize yourself with the concepts of what a CSV and a JSON is along with databases at a high level. We recommend you actually watch the course that's also part of this learning path that discusses how to pick the right database and if maybe a spreadsheet is more right for you, but if you have a good grip on those base data formats and databases, you should be right at home in this class, and if you're still on the fence, this class is really ideal for those looking to gain practical, hands-on understanding of data modeling with real examples rather than some abstract ideals, and also, for those who might want to change how they're storing their data. A brief introduction to data transformation. And honestly, most importantly, if you have a dataset that doesn't really make sense, but all the information is in there and you're not really sure to go, this is a great class for starting to see other ways the same data can be presented and accessed.
And just to briefly introduce myself before we get started, my name is Chris Gambino. I'm one of the architects and co-founders at Calculated Systems. Although I'm the one presenting today, it was really a team effort by everybody over at Calculated Systems put this together. Personally, I'm an expert data engineer at Google having worked on projects such as IOT cars, and even big data projects such as streaming hundreds of millions of social media messages a day and doing complex analysis on it.
Data Sources and Formats - Modeling Your Data - CSV - XML - JSON - Specialized Data Formats - Databases - Exchanging Data Across Applications - Applying What we Have Learnt - Sales Data for an Online Store
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.