The course is part of this learning path
This course explores various data file formats that are used for data analytics, big data, and machine learning. So this course is ideal for you if you're looking to understand which file type you should use for your big data or analytic pipelines and make a decision on which file type is right for your workload.
Learning Objectives
- Understand the pros and cons of Apache ORC, Apache Parquet, AVRO, CSV, and JSON file types
- Learn which data file format best suits your needs
Intended Audience
This course is for anyone who wants to learn about data formats and file types, and which ones are right for their workloads.
Prerequisites
To get the most out of this course, you should have some background knowledge of databases, data information systems, and data files.
As you have seen, there are a number of considerations that should be taken into account when determining which file format you should use for your big data, data analytics, machine learning, or generic data collection hobby of choice.
There is no one grand right answer as to which file you should use, it all comes down to what you need and what your solution requires.
Here is a simplified table with all of our previous thoughts on each file type and each domain you should consider. Ill color code them to show how good they are at the thing, but leave the text as a technical yes or no.
In general, I would advise you to steer towards using ORC, Parquet, or avro. CSV and JSON can be used but they offer very few quality-of-life features. And as you decide to use more specific engines they might not even be an option for you to pick at all.
Well, that brings us to the end of this course. My name is Will Meadows and I'd like to thank you for spending your time here learning about data file formats. If you have any feedback, positive or negative, please contact us at support@cloudacademy.com, your feedback is greatly appreciated, thank you!
William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.