Understanding Data File Formats


Understanding Data File Formats

This course explores various data file formats that are used for data analytics, big data, and machine learning. So this course is ideal for you if you're looking to understand which file type you should use for your big data or analytic pipelines and make a decision on which file type is right for your workload.

Learning Objectives

  • Understand the pros and cons of Apache ORC, Apache Parquet, AVRO, CSV, and JSON file types
  • Learn which data file format best suits your needs

Intended Audience

This course is for anyone who wants to learn about data formats and file types, and which ones are right for their workloads.


To get the most out of this course, you should have some background knowledge of databases, data information systems, and data files.


Hello, my name is Will Meadows and today we will be talking about the various data file formats that you will find out in the wilds of data analytics, big data, and machine learning.

If you have any questions about anything I cover in this series please let me know at

Alternatively, you can always get in touch with us here at Cloud Academy by sending an email to and one of our cloud experts will reply to your question, concern, or comment. 

I would recommend this course for anyone who is trying to understand what which file type they should use for their big data or analytic pipelines. Or for someone who just wants to know what these file types are all about.

After completing this course you will know how ORC, Parquet, Avro, CSV, and JSON perform in the data analytics / big data world.

You will hopefully be able to make a decision on which file type is right for your workload.

Some background on databases, data information systems, and data files --in general-- would be good.

I also recommend: a strange desire to look into the mysterious world of data file formats.

Feedback on our courses here at Cloud Academy are valuable to both us as trainers and any students looking to take the same course in the future. If you have any feedback, positive or negative, it would be greatly appreciated if you could send an email to

Please note that, at the time of writing this content, all course information was accurate.  AWS implements hundreds of updates every month as part of its ongoing drive to innovate and enhance its services.

As a result, minor discrepancies may appear in the course content over time.  Here at Cloud Academy, we strive to keep our content up to date in order to provide the best training available. 

So, if you notice any information that is outdated, please contact  This will allow us to update the course during its next release cycle.

About the Author

William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.