Neo4j is a graph database management system that lets users create and manipulate graph data. It's a powerful DBMS for creating databases where the data has a large number of relationships with other nodes, and where users want to perform lots of analytic searches. For this reason, it's used a lot with social networks and marketing activities.
In this Tech Talk, you will follow along as one of our cloud experts, Stefano Cascavilla, walks you through the features of Neo4j graph databases and to get the most out of them. You will also learn about the Cypher, the programming language used in Neo4j.
If you have any feedback related to this course, feel free to get in touch with us at support@cloudacademy.com.
Learning Objectives
- Learn about Neo4j, it's features, benefits, and use cases
- Understand how to manage data using the service
- Learn how to use the Cypher programming language
Intended Audience
- Data engineers
- Anyone interesting in learning more about Neo4j
Prerequisites
To get the most out of this course, you should have a basic knowledge of databases.
Hello, everybody, my name is Stefano Cascavilla, and I'm one of the content guys. And I'm here with Luke Orellana and Andrew Burchill from the Content Team too. And today I'm gonna have these Tech Talk about Neo4j. So let's get started.
So the first thing you want to know is what is Neo4j? Neo4j is a graph DBMS database management system that lets users create and manipulate graph data. So imagine you can have a database where you can insert any kind of graph data, so nodes, edges, and you can represent all your data inside a graph. It is a NoSQL database.
So you don't have to follow all the conventions that the relational database uses such as PostgreSQL, MySQL, Oracle, and others follow, and no relational model. And the schema is not mandatory, even if you can create it, of course, you most of the users don't create a schema and don't use the schema definition, but with the Neo4j DBMS you can create it and it is not the same thing as the schema for relational databases, but it is defined by using indexes and also constraints and we will talk about the schema and these two parts in the next slides.
So Neo4j is also a powerful DBMS to create a database where each data needs to have loads of relationships with other nodes, and where users want to perform loads of analytic searches, for example, and analytic operations. So, one of the most famous use case, when using a graph database, is a very good choice is the social network. So suppose you want to represent users posts, images, and also reactions that users put to each image or post.
You can see then you can represent it in a very efficient way by using a graph database. So Neo4j in this case, instead of using the classic relational database. So the main concepts that you need to focus on when treating graph database and especially Neo4j are nodes and labels, relationships and types, properties, and also the schema which is composed by indexes and constraints.
So nodes and labels. A node represents an entity that you want to insert into your database. So for example, you want to create a database when you want to record persons or users, okay, all these entities that you want to handle in the word you want to represent in your database, and needs to be translated into nodes. And you can have multiple entities of the same type.
So, you group them by using labels, the label that you associate to each node represents the type of that node. So, you need to imagine a label as a class, where you insert all the nodes that share common properties and things.
So, in this case, we can see two person nodes with the same label and these are properties, we will talk about them in the next slides, then we have a Technology node, and then we have also a Company node. So, these two share the same label, these have the technology label and this has their company label.
So relationships and types a relationship described with an edge in this case represents a relationship between two nodes. Of course, you can create also a relationship between the same node for, so from node A to node A, this is a rare situation, but you can do it of course in Neo4j. The most common use case is from a node A.
So, for example, in this case, this is the Person one to another node in this case, this is Technology and you have to keep in mind that each relationship has to have a direction. As you can see there is a row. So this relation is from a Person whose name is Jennifer to Technology whose type Graphs then another relationship between Person and Person and another one from Person and Company.
There are situations such for example is friends with when you want to have a bidirectional relationship. So we suppose that if Jennifer is friends with Michael, also Michael is a friend, has a relationship is friends with Jennifer.
If you want to represent this particular situation, you need to explicitly set another relationship, whose type is friends from Michael to Jennifer. And when you create a relationship, you need to specify the type of it, which identifies the kind of the relation. And in this case the types are likes is friends with and works for. And as you can see from this image when you define a type for a relationship, the convention is that you need to set it uppercase.
Okay, so, the next topic is about properties. Properties are used to better describe the components in the graph. And the main components as we just saw, are nodes and relationships. So, they can be associated both with nodes and the edges so relationships. And what is a property? A property is a key value field, where the key represents the name of the property, and the value represents the value of that property.
So, in this graph, we have the few different properties for this node person, we have the main property whose value is Jennifer, for these one we have the type property whose value is Graphs and we have also these property which is associated to the is friends with relationship, whose name is since and the value 2018.
You can attach multiple properties to each node and to each relationship. And basically, there is no limitation about how many properties you want to attach to each node and to each relationship inside your graph. So schema is the last main concept of Neo4j. And as I said before, it is composed by indexes and constraints. It is not required in Neo4j, but can be defined most of the users don't use it. But if you want it you can of course create it.
So what are indexes and what are constraints. Indexes are parts that you associate to properties on each node. Indexes now are supported only for properties of nodes, and not for relationships. Not yet at least, because we are now at the Neo4j 4.1 version. And the indexes are also supported for and those properties. And when you attach and when you define any index for a property on a node, it is used to improve performances for operations, especially queries that you perform on specific nodes with specific properties.
So suppose you have a graph database where you want to represent users or persons. And on each node that is labeled with the user, you want to set the property name with a value. Okay, so let's suppose you need to identify and want to perform loads of queries on these nodes by using the name property. Well, this is a very good situation where you should insert and define an index on the property name for the user nodes not the label, because in this case, when you will perform the match, so the queries in your database on the property name it will be faster than performing this query operation without using any index. And let's move on constraints.
So constraints are can be defined both on nodes and relationships. So they are supported by both resources. And they are used to validate graph data while entering or changing them, it means that you can define limits about data you want to insert and have inside your graph database.
So let's suppose the same situation as before, so a database where you want to store nodes that represent users or persons and you want a property whose name is age, for example, and you wanted that the old nodes of whose label is user and need to have age value greater than 16 for example.
Okay, you can of course handle it by using your the software, so software side, but if you use constraints it is much powerful, because in this case, you can define a constraint, okay, so setting it on this value and you want the value of the age property needs to be greater or equal maybe to 16.
So for example, if you try to create a new node, whose label is user and whose age is set to 18 it is okay. So, your Neo4j won't tell you nothing. So, it will allow you this operation, but if you try to create another node labeled with user and whose age property is set to 13, 14 for example, it will raise an error because you broken, you just broken a constraint. So, it is very useful when you need to set limits with the parameters on properties.
So, the last thing you need to know about Neo4j is what is Cypher? Cypher is the language used in Neo4j and instead of SQL it is much simpler to use and to understand. Well it acts both as DDL data definition language. So, you can create indexes and constraints and part of the schema and it acts also as DML. So, data manipulation language it means that you can manipulate data, so creating and also deleting nodes and the relationships and the Cypher is based on a graph patterns.
So, what is a graph patterns? A graph pattern represents a portion of a graph. So, a set of nodes and the relationships of course, with properties if you want and you want to work with, so for example, if you want to create a portion of your graph, so for example, two nodes node A, node B and the relationship between these two nodes, you first need to define the graph pattern. So, these two nodes and the relationship and then you have to translate it by using Cypher and creating it by using the current clauses.
Another example is that you want to perform a query okay. So, you want to match a portion of your graph. So, you first need to define the graph pattern and then you translated it using Cypher and you perform the queries. So, for this reason, because you can visualize the result of the creation for example, or deletion or also the query.
Cypher could be simpler than other DB languages. And of course, if you want to create or free something, just visualize it, so, the graph pattern and write it using Cypher. So, we talked about creating data performing queries and deleting them, but let's see the master useful and the main parts of these Cypher language so Cypher represents nodes with round brackets if you want to represent nodes or match nodes, you need to define brackets inside, you need to set the label you want to create or create and you can also associate an alias, an alias is just a string a portion of string that you associate to nodes or relationships and it is used for example, when you need to create nodes and then returning them or for example, when you need to match nodes or relationships and then you want to return that.
Cypher represents a relationship. So with the square brackets and properties are described by using braces, braces are not alone because they are always inside or a bracket because you are for example defining or specifying properties related to a node or they are inside square brackets, because you are referring to relationships properties.
So the commonly used and useful clauses using Cypher are create, this is the main clause used to create a node or a relation, then you have the match clause which is used to perform a query. So match should be why match because you defined you've filled about your graph pattern.
You then convert in Cypher, so you're implementing in Cypher and you want to match this portion of graph inside all the graph database that you have in Neo4j. And well one of the most useful clause is also return especially for a match because when you match something, okay Neo4j perform your query and get the result. But if you want to visualize the result, you need to return it by using the return clause and you need to set use the alias when you use the return clause.
So for example, you have set an alias A for node an alias B for the relationship and an alias C for the other node and you want to return the node, the relation and the other node that you matched, you have to perform the return clause. So, return A, B and C. So, in this case, you will visualize the result of the query. And you can also use the return clause when you create something on or when you delete something, because maybe you want to be sure to have performed the correct operation.
So, for example, you create nodes, multiple nodes or a node with a relation between another node and you want to be sure that the resources you just created are correct. Of course, you can set an alias for resources and then return them. Or for example, if you have to delete resources, for example, a node or a relationship between two nodes because you understand this is blank, you can be sure to have deleted the right thing, the right resource by using the return clause to better, to visualize what is the result, what the common decypher command, you performed have done.
So, that's all for these Tech Talk. And hope you enjoyed guys, and thank you very much for your attention. If you have any questions, feel free to ask me.
- [Participant] I have a question.
- [Instructor] Yeah.
- [Participant] What other besides social network or social media, what other situations would you use Neo4j for?
- [Instructor] Okay, what the most common situations are for marketing purposes. So, for example, social networks, social media or any kind of other application where marketing is essential is a very good solution for a Neo4j, or for example, database with loads of data related to single entities that have loads of relationships. So for example, at a hospital, so you have a passion which is associated to medical situation, you have also a folder for these user and maybe you want to perform loads of analytical operations on all the data that you have inside your database. So, also, this is a good situation where you can use graph database. And the main situations where and Neo4j or graph solutions are used are also with marketing, yeah. Because they can perform faster analytical operations and powerful analytical operations instead of using a relational database, classic database, which would, of course, can cover the situation, but it would be less powerful than a graph database.
- [Participant] Okay, wow. That's really cool.
- [Instructor] Thank you.
Stefano studies Computer Science and is passionate about technology. He loves working with Cloud services and learning all the best practices for them. Google Cloud Platform and Amazon Web Services are the cloud providers he prefers. He is a Google Cloud Certified Associate Cloud Engineer. Node.js is the programming language he always uses to code. When he's not involved in studying or working, Stefano loves riding his motorbike and exploring new places.