Practical Example: Size
Moving Beyond Spreadsheets
The course is part of this learning path
This course discusses some of the fundamental concepts of data management and looks at the differences between spreadsheets and databases for managing data. We'll look at some specific examples to understand when spreadsheets makes sense and when it makes sense to switch over to a database, which is sometimes a much better option for more complex datasets.
Specifically, this course aims to give students a practical hands-on introduction to database concepts. In addition, we'll gain an understanding of how to select the right database and we'll go through the basics of setting up an RDS instance on Amazon. This course includes a practical example of a company that is looking to choose a database, to give you an understanding of how databases work in the real world.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Understand the difference between spreadsheets and databases and when to use one or the other
- Learn about the different types of database available and the various features and characteristics to consider
- Learn how to choose the right database
- Learn how to deploy an Amazon Aurora instance
This course is designed for anyone who wants to improve their knowledge of databases and understand when it makes sense to use them as opposed to a spreadsheet.
To get the most out of this course, you should already have a basic understanding of simple data structures such as comma-separated values, as well as an understanding of cloud concepts in general.
Next up is size. This is literally at the most basic level how much data are you trying to cram into this database? Is it hundreds, thousands, millions of rows of data? Remember, the width is also important. Whereas a database might be able to handle a billion rows of three columns, if you have a hundred columns, that, of course, is gonna take up a lot more size, potentially.
Now, for those of you watching already familiar with databases, you'll know there's a lot of advanced strategies to mitigate large sizes with indices. That goes a little beyond this class, but just for those of you watching going through this exercise with us, think in terms of size.
Now, our coffee bean subscription service, we have about 45,000 active subscribers. There's about, a little over a dozen varieties of coffee beans we sell, and on average, we're getting about a thousand issues a month being reported up. This is all the channels. So we have one large dataset, number of customers, but in terms of dynamic data, it's only a thousand sets. So we're gonna note that down and that'll affect our decision down the line.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.