1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. Moving Beyond Spreadsheets

Speed and Reliability

Developed with
Calculated Systems
play-arrow
Start course
Overview
DifficultyIntermediate
Duration42m
Students111
Ratings
5/5
starstarstarstarstar

Description

This course discusses some of the fundamental concepts of data management and looks at the differences between spreadsheets and databases for managing data. We'll look at some specific examples to understand when spreadsheets makes sense and when it makes sense to switch over to a database, which is sometimes a much better option for more complex datasets. 

Specifically, this course aims to give students a practical hands-on introduction to database concepts. In addition, we'll gain an understanding of how to select the right database and we'll go through the basics of setting up an RDS instance on Amazon. This course includes a practical example of a company that is looking to choose a database, to give you an understanding of how databases work in the real world.

If you have any feedback relating to this course, please contact us at support@cloudacademy.com.

Learning Objectives

  • Understand the difference between spreadsheets and databases and when to use one or the other
  • Learn about the different types of database available and the various features and characteristics to consider
  • Learn how to choose the right database
  • Learn how to deploy an Amazon Aurora instance

Intended Audience

This course is designed for anyone who wants to improve their knowledge of databases and understand when it makes sense to use them as opposed to a spreadsheet.

Prerequisites

To get the most out of this course, you should already have a basic understanding of simple data structures such as comma-separated values, as well as an understanding of cloud concepts in general.

Transcript

Thirdly, we have to consider the speed and scalability of the solution. I know I said two separate items there, but they're extremely related. How fast does your application need to be? How quickly does the data come into the database and how high is the throughput? Basically, are you expecting a spike in the number of users? Is there going to be a rush minute, a rush time? Or maybe it's just gonna be a steady-state where people are riding continuously all evening, all night, it's just an automated process or just an around-the-clock slow rate.

NoSQL databases tend to excel in terms of performance and scalability, and that can be considered when processing large volumes of data, especially when it's unstructured, but SQL solutions excel when there's a lot of joins and that that needs to be the primary focus. One final note I'll say about all of this. For smaller data sets, particularly when you're only in a couple of thousands of rows, speed is not as big of a fear point as some people might fear it is.

Speed and scalability only really start to matter as your data set gets bigger and bigger, or you get more and more users. Don't be afraid of speed and scalability concerns if your data is relatively small or it's a relatively small team without insane requirements. Just know and be aware of as your data gets bigger and more complex and your team grows and prospers, this is where you need to focus more.

So to tie it back to our fantastic coffee bean delivery service that's building a new customer support dashboard, we're typically expecting that people are gonna be submitting requests. There's really going to be no more than six users at a time, we anticipate. This is either people entering new complaints or people responding to complaints or people viewing the dashboard. And as this isn't a business-critical, need to know within a millisecond of a complaint coming in, a one to two second response time is going to be acceptable for our application.

About the Author
Students1924
Labs14
Courses8
Learning paths11

Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity.  With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing  decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.

Covered Topics