Introduction to SQL
The course is part of these learning paths
In this course, we will provide a brief overview and history of the SQL standard. We will then learn how to read and write data using structures created with the language SQL. By the end of this course, you should be able to recognize and explain the structured query language and be able to read and write SQL commands. This course is suited to anyone familiar with computing who is interested in learning how to read and manipulate data stored in a relational database.
It will help if you have a basic understanding of computing services to get the most from this practical course.
We welcome all comments, questions, and feedback, so if you need any help with this course please reach out to us at firstname.lastname@example.org
This course is built from content created and delivered by Ian Wallington of QA.
- [Instructor] Hello and welcome to this lecture where we will explore the Data Definition Language, and how we can use it to create database objects such as tables.
Now, first, we will demonstrate the creation of a very simple table. I'm going to create a table called books. This books table is going to have a number of columns in it. I'm going to create one with a primary key column. I'm going to call it my primary key ID to start with.
Okay, so with it done, now we can look at some of these data types. Now, these are the standard data types. Always refer to your database management systems documentation, because the data types can be different per platform. Things like floating-points, numeric and decimal are synonymous. You can also explicitly specify that a column may contain NUlls. If you don't specify that it's not NULL, it implicitly can be set to NULL.
Okay, so let's get back to creating our table. So we've got an ID column, an ID column for the primary key is going to have a title, because books have a title. It's going to have a price. I would really have more, you'd have loads of information for books, but I'm going to stick to ID, title, a price and also for audit purposes, I'm going to put a column in there called dated added. And I'm going to get that to be populated automatically with the current system datetime.
So, pretty easy to make a table. The CREATE TABLE statement requires a list of column definitions. And at a minimum for each column, I need to specify the name of the column and the data type. I need the column name, which for my first column is ID. I'm gonna say the data type is as an int, what we call I-N-T, which stands for integer. Now, according to the SQL standard an int column should be a 32-bit binary-signed integer. So values plus or minus 2 billion odd basically, that should be enough for my books. Now in this particular case, being as I'm also going to be making this the primary key, it's going to be implicitly and not NULL column. Just to be sure, I'll actually set this, because after the name of the column and the datatype, you can optionally provide attributes such as whether the number is required or not. So I'm saying that this is a mandatory column. It is also possible to get the database management system to generate numbers for that column, but the mechanism is different for each of the various database management services. So we'll just leave that for now.
If you want to use an auto number, check your database management server documentation. At the moment, all I want to say about that column is that it's mandatory that it's an integer, and it's gonna be our primary key. The title is not going to be a number, it is going to be a string of characters. Now, I could use a database character called CHAR or character to give you its full name. Then I would say it was a character column. And we could say no more, it's a one character column, which wouldn't be good for many books. So I probably want something like 100 characters, so that I've got enough room for any title. Now, having said that, there's always a large variation in the length of book titles. So I'm gonna store it as a variable length character column, of VARCHAR as it's called, it takes up more room than a specified character column. So it's only really a concern when you're having a large database and you want to try and limit the number of rows and speed up your queries to make them as efficient as possible.
So, meanwhile, a VARCHAR character is going to be a little bit more flexible. And now with this, I can specify the maximum number of characters in parenthesis, and I'm going to go up to 100. So the actual maximum you can put in there will vary from each of those. 255 characters is generally a standard. Now I'm also going to specify a price. Now what I could do, I could define prices being a floating-point column. However, floating-point columns are not great for financial information such as this. If you have a floating-point column, floating-point data type, then you can store really large values like numbers of grains of sand on planet Earth, for example. You might need to do that for some type of columns. But really big numbers or really small numbers, any number if you're not that bothered about the exact position. The thing with floating-point numbers is that they're stored in binary. As all numbers are, but they're stored in two parts, mantissa and exponent using some complex mechanism, which is a little beyond the scope of our course today. But what it means is that numbers in decimal, the correct term is rational, i.e 0.99 is the exact number, that can be stored in a binary floating-point column. It is, so to speak, an approximation. And we're not going to use a floating point, but I'm going to use a fixed point number. And with this, I have to specify how many digits that can hold. And how many of those digits are after the decimal place. I'm going to say that can be NULL, because we don't have to have a price. So it's okay to have a NULL value in that column.
Okay, so that's our price column, it's a numeric with a five, two ratio. So we're going to allow five numbers and two pass the decimal point. So two points will be decimals. So we can go 1999 or 799, etc. The next column we need to add is a date column. And we're gonna call this date_added. It's gonna use a DATETIME function.So, let's talk about that.
So, although it says date added, it's actually the date and time that's added. Well, now you've specified that the column is date, it will store the time as well, it's a DATETIME function. I'm going to say that the default value is GETDATE. GETDATE is a function. It determines the current system datetime. Every time I save, it gives me a different time. And I'm using that here, so that it will keep the exact time that the record is updated. We can add this here as a default value. So when I insert a row, I won't be providing a value for data added. I will get that from the GETDATE function in the database management system. So we can test this. We can run INSERT, let's provide values for the ID, title and price.
Okay, so we have a unique value for columns other than primary keys, and that specifies that every value must be distinct. We also have a check function, which allows us to specify a range for the column. So it's similar to a WHERE clause. So you can say I want the number to be between one and 99, and must be resolvable by examining just the row that's been modified. So that check will throw an error if for some reason it can't calculate whether it's inside that range that you've set. So you can also specify a check constraint. The role of that primary key is that each row in the table must have a unique value for that primary key. So that becomes a constraint. If you have a relationship between two tables, you have a foreign key. On the table at the mini ends of the relationship, you specify the foreign key, and in the child table, that then becomes the foreign key constraint. So it's a generic term for a rule. You can specify check constraints, unique constraints.
Here we have a table without any constraints. This is a version of the contact table. Notice, we have no primary key. Now, that's perfectly okay, not regarded a good practice, but it's perfectly acceptable syntactically. And then we have the DROP TABLE statement. Now, this is with the attributes, so we're setting this to a default of 50,000. And we're checking that it's between 20 and 99,000.
This means we are also making sure that the department name is unique. That seems like a reasonable thing to do. We don't want two HR departments or whatever. The ALTER TABLE statement, and this is the third of the DDL statements, So create a table, make them drop, gets rid of them. So ALTER changes the definition. With the ALTER TABLE statement you can do things like add extra columns, change the length of a VARCHAR column. If you create a table with a VARCHAR column and find later on that it's not big enough, you won't be the first person in history ever to have done that. It's very important to understand the documentation around the ALTER TABLE statement. There are statements that will run in Microsoft SQL Server, but Oracle and MySQL implementations of these are different. So ALTER TABLE, the name of the table, and then add or change the definition of a column. I would say that any of these in an operation that needs to be done with some forethought and care, it's not something that you want to do all the time. In MySQL, you can quite easily drop columns that are in fact involved in keys like a primary key. So again, it's something that you need to do very carefully. The ALTRE TABLE phrase is something that needs to be taken with lots of care.
Okay, so now let's have a talk about referential integrity. Now, here we have a very simple model. If we wish to get rid of a row, say number two, the company number two row, Corporate Trading Limited, what should happen, given that we have made sales to that company, so has sales records? Should the DELETE statement be disallowed? We would want to keep sales records of the same records, and so we have a number of options here. One is that you could just delete the company data, but that will impact other records. Another option is you can set up a cascade. If you delete a row from the company table, a cascade also wipes out all of the sales. Now, that wouldn't be so viable because we're altering record history. And if we do that it makes reporting hard for someone like the Finance team. As part of the GDPR compliance requirements, any entity or individual can ask and demand that their details be removed from any system. Therefore, we wouldn't possibly need to be able to remove data, but we need to be able to keep the sales records. So they ideally would need to be anonymised.
All right another way we could deal with this, is we could allow deletion of the row, and then get the database engine to set these values to either NULL or their default value if we set one. So we might use zero to indicate all of those. You might use to set NULL. You can optionally provide a constraint name, with the key word CONSTRAINT square bracket there to indicate that this is in fact optional. So here we're saying that this is the contact table with a primary key. Now remember, it's a composite. And then we have a foreign key. We're giving it a name, foreign key company link, the company number column, here is a foreign key, referencing the company table. Now it's not specifying which column in the company table, it's going to assume you mean the primary key. If we don't specify which table column it's going to be bound to, it will assume it's the primary key, for Microsoft SQL Server and for Oracle. In MySQL, you'd have to put company number in parenthesis as well. So this is how you formed a link between the contact table. What it's saying here is that if a company is deleted, then all the context will also be deleted. Now this is using DRI, which stands for Declarative Referential Integrity. Most modern databases support Declarative Referential Integrity.
Now, in this CREATE TABLE statement, this statement creates a table so that we can store information about performance reviews for our sales people. As we are right now, we do not have a link between that employee_number column, and the salesperson employee_number column. So, essentially, it's not a connected table. If we try and run this, can we create two tables in the same database with the same name? No, that will not be allowed. So using the ALTER TABLE statement, we're adding a fourth column. Here's an INSERT statement. Now this statement is inserting a row for a fictitious salesperson with an ID of 23. Note that we haven't specified a primary key either. So if you look here, we can run the statement as many times as we like, it doesn't have a unique identifier. We end up with three rows where two of them have the same values. That's the duplicate of that one. So as a general rule, having duplicate rows in a table is not a good thing. The problem is, it's not easy to get rid of just one. How do you now tell them apart. So let's get rid of it. This is the DROP TABLE statement again. There's no our usual prompt here, it just does it. Which is why a DROP TABLE is very dangerous command. Okay, so now let's rewrite our CREATE TABLE statement. So we are specifying that the primary key is review_date. So this is a composite key, and this is the foreign key. We're saying that the employee_number column here refers to the salesperson reference. Now if I run that I've created a much better table. I can run that no problem. We have a salesperson with an employee number of 40. We can put that in but what happens if I try and run that again?
Now, I have the primary key, I can't insert a row for a non-existent employee. So that's a good constraint. All of these will also fail because there is a relationship in place now, between the salesperson and review tables. So we can't change the review of those that I have now created. So, if we go to a non-existent employee it will fail. We can't delete that salesperson because of that foreign key constraint. We can't change the value of the primary key for that salesperson. The dept_no column is the salesperson table reference to the department table. So when you have a salesperson, you have to put them in a department. So when will our salesperson be put into a department? Departments one, two, and three have sales people. So I could quite happily delete department four, department five and of these can be deleted, but this one can't. One, two, or three can't be deleted. The point is, all I'm asking to do here is to delete all departments. Quite simply, I cannot do that. So this request will fail. That incidentally, is a system generated constraint name, which is why you might like to give them your own names.
Okay, so that's where referential integrity can help us protect data integrity using DRI in our database. And it's important to work and design your tables to best use those constraints to best effect.
All right, so that's the end of this lecture close this one I'll see you the next one.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.