image
Graph
Start course
Difficulty
Beginner
Duration
50m
Students
139
Ratings
5/5
Description

In this course, we will introduce you to some common types of databases. Different data problems can be solved in a wide variety of ways, which is why so many different types of databases exist. After learning about a few different types of databases, you’ll be ready to explore specific implementations further.

Learning Objectives

During this course, I’ll introduce you to the following:

  • SQL databases
  • Key-value databases
  • Document-databases
  • Graph databases

Intended Audience

This course is for novice:

  • Software engineers
  • Data engineers
  • DevOps engineers
  • Site reliability engineers

Prerequisites 

  • You should have at least a conceptual understanding of programming and be comfortable with data structures, data types, etc.
Transcript

Working with data can be tricky because there are many ways to solve any given data problem. Ideally, you want to represent data using a data structure that best models the problem. Different data structures provide different patterns for data storage and access. When a specific data problem closely models a given data structure, it becomes more intuitive to interact with.

For example: Imagine a bookkeeping application that stores a record for each check issued. This data is well suited for a relational database because it can constrain the check_number column to be a unique value; modeling an actual checkbook which contains a unique check number for each check. When selecting a database to solve a given data problem it’s important to try and use the option which best models your use case. I’m mentioning this because after learning about graph databases, suddenly all data problems look like they should be solved with a graph database.

In this lesson, I’ll introduce you to the rather interesting: graph database. It’s worth noting that the term graph is a bit overloaded. In the context of graph databases the term graph refers to graph theory; where exploring the relationship between objects is important. There are certain types of data problems where understanding the relationship between units of data is at least as important as the data itself. 

A common example is a social network. Each person is connected to other people in some way. Which could be directly or indirectly. Examining the different ways in which objects are connected can provide insights that might otherwise be missed in the data itself. Each type of database has its own atomic unit of data. Rows for SQL databases and documents for document-oriented databases. 

Graph databases use two units of data called nodes and edges. Nodes define properties as an associative array. And they can be categorized using labels, allowing nodes to be queried based on their properties and labels. Edges define relationships between nodes; and often allow properties to be attached providing further context about the relationship. 

The relationships defined in edges can be directed or undirected. The direction of an edge can be important for some use cases. For example: a cat might be friendly with a dog, however, the dog may feel differently. In this case, knowing the direction of a relationship conveys meaning.

Undirected edges consider a relationship to be singular in nature; where no additional meaning needs to be conveyed, outside of the connection. For example, a graph representing a computer network might show all devices for a given network. In this case, direction doesn’t matter. Graph databases are designed to store nodes and edges and provide a query mechanism used to explore relationships between nodes. Conceptually graph databases are document databases which understand the relationships between documents. 

Similar to document-oriented databases, there isn’t currently a standard query language for graph databases. The specific functionality of a graph database is implementation specific. And there are several options out there. Let’s explore a couple commonly used graph databases. The first option I want to introduce is: Neo4j. 

Neo4j describes itself as: “a native graph data store built from the ground up, to leverage not only data but also data relationships. Unlike other types of databases, neo4j connects data as it’s stored, enabling queries never before imagined, at speeds never thought possible.”

First released in 2007, neo4j has grown and evolved into one the most common graph databases. One of the likely reasons for its popularity is its query language called: Cypher. Cypher uses ASCII representations of relationships to make it easier for developers to query connected nodes. Cypher’s minimalist syntax was inspired by SQL and SPARQL. 

Here’s an example of the syntax from neo4j’s documentation. It shares similarities with SQL, though, it’s specifically designed to query graph structures. Notice the values inside the parentheses. These represent nodes which are connected by some edge, represented by the value inside the square brackets. The arrows are used to convey the direction of an edge. 

The next implementation option I want to introduce is: DGraph. Here’s how dgraph describes itself: “Designed from the ground up to be run in production, Dgraph is the native GraphQL database with a graph backend. It is open-source, scalable, distributed, highly available, and lightning fast.“

First released sometime around 2016, dgraph has continued to grow. It includes two different query languages. GraphQL and DQL. GraphQL is a query language used to allow developers to query specific data from APIs; allowing frontend developers more control over the returned data. GraphQL wasn’t created to query data from a graph database. So while it works for many use cases, dgraph created DQL. Which builds on top of GraphQL to make a more domain-specific solution.

Here’s a sample of the syntax from DGraph’s documentation. It queries all films directed by Ridley Scott prior to the year 2000. Being able to model data and relationships has a wide range of use cases including: fraud detection, recommendation engines, social networks, network topologies, and the list keeps going. Different implementations provide different features and query mechanisms. Which means engineers have to be mindful of their requirements when selecting a database. 

Okay, this seems like a natural stopping point. Here are your key takeaways for this lesson:

  • Graph databases are used to store and explore data and its relationships.

  • Graph databases include two units of data: nodes and edges.

    • Nodes are used to model concepts using properties and labels.

    • Edges are used to model relationships between nodes.

      • Edge properties can be used to provide relationship context.

  • Edges may or may not include a direction.

    • Depending on the database implementation and use case.

  • Graph databases don’t have a standard query language.

    • We only saw a couple however, there are many including:

      • Cypher

      • GraphQL

      • AQL

      • Gremlin

      • Etc…

Okay, that's going to be all for this lesson. Thanks so much for watching. And I’ll see you in another lesson!

 

 

About the Author
Students
101042
Labs
37
Courses
44
Learning Paths
58

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.

Covered Topics