This course discusses some of the fundamental concepts of data management and looks at the differences between spreadsheets and databases for managing data. We'll look at some specific examples to understand when spreadsheets makes sense and when it makes sense to switch over to a database, which is sometimes a much better option for more complex datasets.
Specifically, this course aims to give students a practical hands-on introduction to database concepts. In addition, we'll gain an understanding of how to select the right database and we'll go through the basics of setting up an RDS instance on Amazon. This course includes a practical example of a company that is looking to choose a database, to give you an understanding of how databases work in the real world.
If you have any feedback relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Understand the difference between spreadsheets and databases and when to use one or the other
- Learn about the different types of database available and the various features and characteristics to consider
- Learn how to choose the right database
- Learn how to deploy an Amazon Aurora instance
Intended Audience
This course is designed for anyone who wants to improve their knowledge of databases and understand when it makes sense to use them as opposed to a spreadsheet.
Prerequisites
To get the most out of this course, you should already have a basic understanding of simple data structures such as comma-separated values, as well as an understanding of cloud concepts in general.
Let's pull all of this down to earth out of the ether and start to talk about a specific example. Imagine there's a small bakery with a handful of employees. This bakery is a single location with just a owner, a couple cooks and a dishwasher. This bakery maybe wants to report on the type of cookies or the types of pies they sold by week, and it's gonna be edited up by the boss and shown to the bank to justify getting a loan.
So there's not really any complexity here. There's not multiple stakeholders. There's one person editing it with a simple sheet of pies and cookies sold. Compare that now to a much larger bakery with 12 locations, 200 employees. And now you need to do complex slices, such as the type of cookies sold by location on a date, and then you have multiple locations reporting in their sales. You could see how this starts to become too difficult for just a spreadsheet to matter.
But let's slow down and go into an analysis over how can we kind of make this judgment point and put it in roles that you, the person watching this video and hopefully enjoying this content, can start to understand, should you be using a spreadsheet or should you be switching to a database? So to go through the exercise of what we've been discussing so far, think about who is going to be editing the database or the spreadsheet.
In this case, we're talking about a cafeteria slash bakery, and there's only a couple of employees. So make a list of who will want me to enter data or access data and consider what that is. In this case, just to bring you along with the story, we have five employees of which one person needs to be doing edits. There's about 150 items of stock, and there's a physical inventory count every few weeks. This right here is a limited set of data. Inventory data such as this is flat. It's no more than 150 rows, and there's not too many columns just to keep track of counts.
A concept we haven't really begun to discuss in-depth yet, but it will be coming in the series, is the idea of a data model. This is a discussion that is a little more in-depth than we'll cover here, but just know what the concept, a data model is simply an outline of what information you're looking to store and how you're looking to portray it.
Here, you can see a simple list, flat data model. There are six columns and each row is one SKU. So when we talk in terms of data models, and again, we're not gonna go too in-depth here, just know that the left-hand column is the unique, where each SKU is unique, and the rest of the information is data that relates to it. This will come up more in courses to follow, but just try to create a basic data model for your data and fit it into something like this.
If it goes in nicely like you're seeing here, a simple flat list, maybe it's a good candidate for a spreadsheet. And remember, the second criteria is who is going to be updating your data set or your spreadsheet? Think about who is accessing it. But in this case, just to tie it all back to our small baker example, just the boss is updating the data in Excel on her personal computer.
For those of you doing this exercise yourself, think about how you need to report the data. What type of information are you looking to portray? Is it dynamically updated? Is it live updated? Do you need to have data audited by a third party with an audit trail?
In the case of this small bakery with one boss and five total employees, there's no need for that. So in this case, we're able to say that a spreadsheet is actually a really good option for this bakery. For those of you watching, perhaps you're in a bigger company, don't think that just because you're in a 10,000-person company, you need to go to a database. If you're working in a 10-person team, perhaps a spreadsheet is great for you, but let's move beyond that. Let's assume that you need to start working with a database. Let's discuss what your considerations are and how you can start to pick out the right one.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.