When is it Time to Think Beyond Spreadsheets? An Example
Moving Beyond Spreadsheets
The course is part of this learning path
This course discusses some of the fundamental concepts of data management and looks at the differences between spreadsheets and databases for managing data. We'll look at some specific examples to understand when spreadsheets makes sense and when it makes sense to switch over to a database, which is sometimes a much better option for more complex datasets.
Specifically, this course aims to give students a practical hands-on introduction to database concepts. In addition, we'll gain an understanding of how to select the right database and we'll go through the basics of setting up an RDS instance on Amazon. This course includes a practical example of a company that is looking to choose a database, to give you an understanding of how databases work in the real world.
If you have any feedback relating to this course, please contact us at firstname.lastname@example.org.
- Understand the difference between spreadsheets and databases and when to use one or the other
- Learn about the different types of database available and the various features and characteristics to consider
- Learn how to choose the right database
- Learn how to deploy an Amazon Aurora instance
This course is designed for anyone who wants to improve their knowledge of databases and understand when it makes sense to use them as opposed to a spreadsheet.
To get the most out of this course, you should already have a basic understanding of simple data structures such as comma-separated values, as well as an understanding of cloud concepts in general.
Let's quickly run through the type of analysis we did before for the small bakery with a much larger bakery. This case, it has multiple locations, thousands of queues, lots of data. But by going through it sequentially, this'll help you see the process that you and your team can do to select whether or not a spreadsheet or a database is right for you.
So remember, the first step you should be doing is asking yourself, "Does this data fit into a flat structure?" It means that is easily represented by rows and columns without nested data structures, such as CSVs XMLs or JSONs. It also means that the data is relatively consistent. Now, in the case of this larger business, there is of course the bakery making the baked goods, but then there's also storefronts and maybe its franchise storefronts. And then to really add some complexity, there's maybe a truck delivering them to the storefronts.
So in the case of this bakery, you can't really say it is a flat structure that fits into a simple table because there's simply multiple types of data that multiple businesses are bringing in, and it most likely does not fit into a simplified two degrees of freedom, which means rows and columns are right. And secondly, when you start to consider who is making updates to the data, the warehouse, the distribution, the bakery are all reporting different types of information, and probably shouldn't be able to edit each other's information.
The bakery should be able to update what's been made, the shipping company should be able to update what's been delivered, and then the retail front should be able to update what's been sold. So you have here is a clear delineation of who is required to enter what data, and you probably need some strong governance to control that the bakery can't report that more items are sold when they don't do any sales operations themselves.
So in this case, it's a clear indicator that a company with multiple locations, with multiple roles is not a good fit for spreadsheets due to the need for delineation of permissions and multiple people editing it at the same time. And lastly, let's look at the reporting requirements again. In this case, we have a need for live reporting from multiple locations.
People need to know what's in transport. People need to know what's been sold. People need to know from a bakery's perspective, what's in process. So here we need to be able to have multiple types of reports from multiple different datasets. And this indicates it's probably not a good fit for a spreadsheet because this type of dynamic live update with multiple reports is probably more right for a database to handle.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.