Components of a Relational Database
Introduction to SQL
The course is part of these learning paths
In this course, we will provide a brief overview and history of the SQL standard. We will then learn how to read and write data using structures created with the language SQL. By the end of this course, you should be able to recognize and explain the structured query language and be able to read and write SQL commands. This course is suited to anyone familiar with computing who is interested in learning how to read and manipulate data stored in a relational database.
It will help if you have a basic understanding of computing services to get the most from this practical course.
We welcome all comments, questions, and feedback, so if you need any help with this course please reach out to us at firstname.lastname@example.org
This course is built from content created and delivered by Ian Wallington of QA.
- [Instructor] So, what is SQL? SQL is a structure for data processing, where the data is stored in a database management system.
SQL is based on set theory. You can use it to read data from one or more sets, which are commonly called tables. And you can manipulate the data in those tables using structured queries. Tables are often related together in a database. A database is then often part of a relational database management system. The reason why tables are called relational is because you have tables, and the tables are related to each other, or linked to each other.
Relational is a mathematical term. Relation is a set of sets, and, in a sense, a table is a set of sets. Tables have columns and your data is stored in rows. And that's the usual terminology for tables. There are different terms and you can have a set of columns and then you can have a set of rows. Each row is a set of values for those columns. So, it's a set of a set.
There are other forms of databases, such as data warehouses, and noSQL databases. Now, a noSQL database means not only SQL. And so, uses more than the Structured Query Language to organise and read and write data.
If you have an application, it's most likely using a back-end relational database management system. The reason for the existence of relational database management systems is partly down to a gentleman called Edgar Codd. Edgar Codd worked for International Business Machines Incorporated, or IBM, in the 1970s. And, in fact, in 1970, he wrote a paper on how sets of sets could be used to store data.
The first commercial relational database management system was produced by a company called Relational Software, now known as Oracle Corporation. Their version two in 1977, they launched Oracle. That was followed by IBM's own offerings. Db2 is still available and out in the market and still being used. And then there were many others, like Informix, Ingres. Ingres is still around in the shape of Postgres.
The bulk of functionality is common on most database management systems that support SQL. If you were to take a query, there's a good change that it would run on MySQL or Oracle without any modification whatsoever. There are areas where some of the platforms inevitably are different.
Now, this course is run on SQL Server, and this is the SQL Server Management Studio. It's not an overly difficult tool to use, and you can choose any tool to run SQL commands against the database. MySQL Workbench, which is a very similarly fully-featured tools, is another sort of instance that you can use.
Now, originally, the American National Standard was used for the Structured Query Language standard, and then the International Standards Organisation took it over, and released many versions of the SQL standard. That stopped at 2011. The 92 standard is still sometimes called the new SQL standard, and that version introduced a better way of joining tables together than we had prior to the 1992 standard. Currently, the SQL standard has this new joined feature, and this course is written around that standard.
Let's briefly cover stored procedures and triggers. A stored procedure is a program stored in the database. It's a procedure, a procedural step of steps. Do this, do this, do this. The language that you use to write that procedure will depend on the database management system. Oracle uses PL/SQL, SQL Server uses Transact-SQL, often abbreviated to T-SQL. There's not a lot of differences between them but each database management system does have its own syntax for transactions and procedures. The language used in MySQL is indeed based on the SQL standard. SQL3 defines the standard for persistent stored modules, or PSM, so in MySQL, you write your code in SQL/PSM.
In this course, we're covering the core of the Structured Query Language. So, triggers and stored procedures are kind of outside of the scope for this course, but, I just want you to be aware of them and how they work.
With triggers, the idea or goal is that you can write a sequence of events to happen to say that when a row is added into a table, or it's deleted from a table, or it's changed, then run some code - either before or after the event. Typically, a trigger will invoke a stored procedure. Again, it would therefore involve writing code in the appropriate language for the database management system, which means there's no one approach that will work across them all.
So, we're not gonna cover that in this course. We're going to talk about views, typically to say they are stored in databases. We'll touch on indexes. You can put indexes on a column or columns in your database to improve your performance of queries. So outside of system tables, stored procedures and modules, for MySQL, this term is very applicable. I don't think it's quite so appropriate in Oracle or Microsoft SQL Server. The tables that you work with open by a user.
Now, in Oracle, each user has his own set of tables, which are called a schema. So, users can create their own tables, and it forms a database for that user. The same applies for Microsoft SQL server, except the user is by default DBO, for database owner. But if you are using MySQL, then MySQL server will have a number of databases. But they aren't owned in any sense by a user. You have users, a user will have access to certain databases, but not because they own them.
And here, we have a table. A table, or what is described there is, unnormalised data. It's just one table containing information about departments and employees, quite common for data in a data warehouse to be denormalised, i.e. to look a bit like this, for speed of querying. This would probably therefore be defined as a data warehouse, which would be used by an application.
Now, this dataset would, can present a number of problems, as we have a department name and we don't really have a primary key at this stage, but we'll get into that a bit later. The biggest issue here is if we allowed Sue Brown to leave, and we therefore removed this person from that table, we'd also lose the sales department.
We also have duplicated data. The duplicate data we have is that the department name is duplicated as marketing, so both Fred Smith and Bob James are part of the marketing team, so therefore we have a duplication. This means that if we wanted to change that name in any way, we'd need to make sure we change it everywhere it's occurred, and secondly, if we didn't do that, we'd end up with inconsistent data. Now, again, if we're using a data warehouse, then simplistically, this is historical data, so it shouldn't change. That's the theory with a data warehouse. It's what's happened, not what's happening. Therefore duplicate data is fine - It means that your table is simpler for the purposes of queries.
Now, to the last point, that's a problem. If you want to add a new department, department three for example, then these are the rules. You'd have to put something in these three columns here, because every row in a table has to have something in every column. The other thing about tables is a record called null, which we can use to put something in a database row that actually is a null, but that is still a little bit messy. If your database is supporting an application, typically, rather than having any table that stores information about two different things, or if you've used, I think, the correct terminology, entity types, department is an entity type. Employee is an entity type. What would happen typically is the database would be designed working out what entity types your application is going to need to know about, and typically, each entity type becomes a table. Pretty simply, we end up with a department table and an employee table. It is useful in this theory to be able to access an individual row from a table by some kind of identifier. Perhaps, I'm talking here about a subject that you already know about, but you may not have heard it described in these terms. It is a column or columns, which can be used to identify any row in a table. You need a value within that table, and although it's not stated here, that's going to be a primary key. An employee number, emp_no, is going to be the primary key. The theory is when you're designing your entity module, you may have more than one potential identifier in your database or your table. You might even have other forms of unique identifier in there, each of which will be described as a candidate key, and then you either pick one or all of them, or you come up with one of your own, and that becomes the main identifier or primary key.
In this case, I would suggest we come up with what is called an arbitrary primary key, because that's going to be a primary key that we can use. We also have a column there called dept_no. Well, what's that for? It's both changes in department one, yes, and department one is indeed marketing. What we have done here is create a join. You've joined the employee table to the department table, and you did that because this column here is not a primary key. This is now what we call a foreign key. A foreign key is a reference to a primary key of another table, and that's how tables are related to each other, or linked to each other.
So, you have to have a link between the employees and the departments because in the previous slide, there was, they were all in one table. So it's obvious that we needed to have a link. Now, because we've normalised it, normalisation by the way, is a heavily mathematical theory process, that I'm not convinced that many people actually go through. I think a lot of people just use a bit of common sense when designing databases, but actually just because we normalise the data, and the reason for normalising the data, the main motivation is to remove duplicates.
Now, marketing only exists once within our database in the department table. So, with a data warehouse, you don't have quite so many constraints on having duplicates. Are you personally using more disk space, another issue with duplicates, but it does mean queries are a lot quicker, potentially, with fewer joins. We can allow Sue Brown to go on and have another role, and remove that row. We don't lose the sales department in the process. We can change the marketing department's name in just one place. We can add departments without having to come up with any sort of fictitious information value, to ensure that all the record, rows are filled.
There are different terms for these things. Some people call these things records, and I think that's quite an old fashioned terminology. Some people will call them fields. In pure SQL terms, I would say, stick to field or column, because within database management systems like Oracle, there are things called variables, and they are not these, so it may be potentially confusing. Mathematically minded people might refer to these as tuples.
In this slide, I'm going to introduce you to the primary key and the foreign keys, and we'll go down to a little bit more detail on the nature of the relationship between them.
In a relational database, you can model one to many relationships. So here, we have a department potentially having many employees. Each employee is only in one department. If you did have a strange situation where somebody could be in more than one department, kind of a matrix management for example, then what you would do is have a table of departments, and a table of employees, but you'd also have a department employee table. In that table you'd say, this employee is in this department, and this employee is also in that department, and that department also has this employee in it.
So, it's quite simple to create these type of relationships. You have a table with two columns, each of which is a foreign key into another table, and that will be known as a junction table, or at least that's a commonly used term. You can have one-to-one relationships. You can also have one-to-many relationships, as we now have between department and employees.
It also talks here about referential integrity. So, referential integrity refers to the accuracy and consistency of data within a data relationship. So in the relationship, data is linked between two or more tables, and that is achieved by having the foreign key reference a primary key, in the primary or parent table, rather than having Sue Brown removed from the table, and added to another department. It would not be a good idea to get rid of that row. The other thing is it wouldn't be particularly right to go and put a new employee into that table where the department number is 16. The database management system will prevent you from doing those things.
So, in our data model here, employees depend on departments, rather than departments depending on employees. So, effectively you could empty the employee table. You could do that with a DELETE statement, where you could target individual rows using a WHERE clause.
Okay, so let's start learning a little more about the statements and operators available to us in Structured Query Language. So, essentially we have the data manipulation language, which is generally the four verbs, SELECT, INSERT, UPDATE, and DELETE, and most of SQL statements are based around those four DML verbs.
Now, the next level is data definition language, which is CREATE, ALTER, and DROP statements, and that's generally used for tables, views, columns, indexes, and databases themselves, and it's largely the domain of the database administrator.
Then, we also have another layer which is called data control language, and this is where we're able to set security for database users. So, we can use the GRANT and EVOKE statements. Using GRANT and REVOKE statements, a database administrator or security manager could control what access you had to which databases, and which tables in that database you could see, and what operations you could carry out on it - even down to the column level.
Now, there's generally a common programming interface. That is the domain of PROCEDURES, TRIGGERS, CURSORS, TRANSACTIONS, and LOCKING. As they are quite specific to database management services, we'll leave them outside of the scope of this course.
Okay, so here we have a database entity relationship diagram. Sometimes, that would be called a schema. It's a schema diagram, 'cause a schema is a collection of tables. This is quite a simple database, and doesn't necessarily reflect one that we might design in real life. Generally, we'll be working with 50 plus tables or more. Here we have five, which is designed to be simple so that we can work with it and learn from it. It's a business-to-business sale system, but at the center of it, we have sales. Our contact is there, we make sales to contacts, contacts represent companies that we sell to, and we have sales people who make those sales, and those sales people organise the departments. Department number, department name, manager and sales target, those are columns in the table. But we singled out department NO. That's the primary key of the table. This table here, the contact table, has two columns in that box.
Now, first of all, that doesn't mean there are two primary keys. I would suggest the name primary key implies that you can only have one. So, if you can't have two primary keys, what does it mean? It means that both of these columns are used to be the primary key. Now again, this doesn't stand in-depth analysis, but we have an example of what's known as a composite key here.
The theory is that the contact is somebody's initials, and that therefore isn't necessarily unique. Now the way our model falls down is that you can only have one set of initials in the company. The idea is that a company number and contact code, combined, identifies the contact. That has implications when we come to join the context table and the sales table, which we're able to do because the sales table contains a company number and contact code columns, as indeed the salesperson table, department_number column. That's the primary key. This is the characteristic of a composite key, but one part of it is likely to be a foreign key into another table. This means that you can tell what company a sale is made to, whereas you can't tell what department was responsible for making the sale without going to the salesperson table. But that's our database or database schema.
Okay, that brings to an end our introduction to the Structured Query Language, and an introduction to the relational database management service.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.