This course discusses some of the fundamental concepts of data management and looks at the differences between spreadsheets and databases for managing data. We'll look at some specific examples to understand when spreadsheets makes sense and when it makes sense to switch over to a database, which is sometimes a much better option for more complex datasets.
Specifically, this course aims to give students a practical hands-on introduction to database concepts. In addition, we'll gain an understanding of how to select the right database and we'll go through the basics of setting up an RDS instance on Amazon. This course includes a practical example of a company that is looking to choose a database, to give you an understanding of how databases work in the real world.
If you have any feedback relating to this course, please contact us at support@cloudacademy.com.
Learning Objectives
- Understand the difference between spreadsheets and databases and when to use one or the other
- Learn about the different types of database available and the various features and characteristics to consider
- Learn how to choose the right database
- Learn how to deploy an Amazon Aurora instance
Intended Audience
This course is designed for anyone who wants to improve their knowledge of databases and understand when it makes sense to use them as opposed to a spreadsheet.
Prerequisites
To get the most out of this course, you should already have a basic understanding of simple data structures such as comma-separated values, as well as an understanding of cloud concepts in general.
Now, unfortunately, one of the less fun things to think about in terms of database is redundancy. What happens if it goes wrong? Now I know everybody watching this class will never make a mistake and hardware will never fail, but sometimes even the cloud solutions have a downtime or an outage or they lose data. So what happens if something goes wrong? How does your application respond? How much downtime are you able to tolerate?
Typically when people talk about this, they use the acronyms RTO and RPO, which means recovery time object and recovery point object. Simply put in plain English, this is how long can we be down for, and when we recover how much data can be lost. So that means we can be down for five minutes, but the last three hours of data can be lost. Thinking about just what happens in this is very important.
Now, the easiest way to just handle redundancy built into every cloud platform is hourly or daily snapshots. But it's important to note, if you're building a more redundant or highly available application that you need to start to address these issues. To put this in terms of the practical coffee shop, we are going to be acquiring data that is in nowhere else. These customers are submitting their web form submissions or phone call logs, and if we were to lose our database, we might not be able to get these complaints back. So we wanna make sure at the very least we have snapshots.
Now, if our complaint portal goes down for a few hours while we recover it, I'm sure that's acceptable. But very importantly, we wanna make sure we don't lose data.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.