SELECT statements - Aliases, Sorting and Removing Duplicates
Introduction to SQL
The course is part of these learning paths
In this course, we will provide a brief overview and history of the SQL standard. We will then learn how to read and write data using structures created with the language SQL. By the end of this course, you should be able to recognize and explain the structured query language and be able to read and write SQL commands. This course is suited to anyone familiar with computing who is interested in learning how to read and manipulate data stored in a relational database.
It will help if you have a basic understanding of computing services to get the most from this practical course.
We welcome all comments, questions, and feedback, so if you need any help with this course please reach out to us at firstname.lastname@example.org
This course is built from content created and delivered by Ian Wallington of QA.
- [Instructor] Welcome to data manipulation. In this lecture we will cover select statements, how we limit what columns, rows, and values we select. We will then look at calculations using functions. And then we will cover aliases and sorting and removing duplicates.
So, let's start with the select statement. In our demo here, I have a basic select query selecting everyone from the salesperson table. The result's based on all rows in the table. There is a way of getting the select statement to bring back a particular row because that's one of the reasons why we store data in a database.
So, we could ask questions about the data. There's an additional clause where after the from clause you can place a where clause. In the where clause, you can write a condition, or a predicate is the term that's often used.
So, let's start with a simple one. Let's find salespeople with particular targets. So, let's pick when the target is equal to seven. In this particular case, each of the salespeople has a unique target. So, whichever target you pick you will get at most one row. So, those with sales targets equal to seven. For completeness, there are variances on this based on the platform that you're using. Microsoft SQL Server, Oracle and MySQL, you can use an exclamation mark as a form of not, that is not equal to. The SQL standard says using a less than and greater than symbol stuck together is not equal to. So, great than and less than. Greater than or equal to have to be that way around.
So, how do these two values relate? The seven here is literally a seven. I've used numbers, literal numbers. There's no need to quote these. You do need to quote character values. If I want to look for salespeople who are in Surrey, then you have to put the character literal in single quotes. Character literals you have to put in quotes. All the databases use single quotes to delaminate a character string.
So, what about capitalisation? Surrey starts with a capital S. If I look and state for Surrey with a lowercase S, and you see there are two possibilities here. Either it says it's a capital S, so it's a match, or it says it's a lowercase S. It's actually a different character from the uppercase S. The answer will depend basically. It depends on what database management system you are using, and potentially how it has been set up. Oracle is case sensitive when it comes to data. MySQL is case insensitive out of the box. It uses case insensitive collation but you can set the server or database or table or a particular column to use a case sensitive collation. With Oracle you can make that case insensitive or leave anything that is case sensitive to a default. So, you use lower function to get a lowercase string of characters, and then you just need to make sure that your value in lowercase isn't lowercase and that will work. So, those are the standard relational operators. The where clause gives you the ability to ask the database questions. Put conditions in there and what you get back are the rows for which the condition or conditions are true. The term predicate is often used to mean condition in SQL.
We have, in addition to the symbolic operators, we have in SQL a number of words that we can use. Here is the between operator. I think it's probably better to go back to my numbers, looking for sales targets between 11 and 13. And this is an inclusive range checking operator. What then is between? You could do range checks. You could do this on character data. You could use it to find all codes from A to E and F or whatever. It's more likely to be used with numeric or date values. You can use it with date ranges. You do have to be a little careful when you have a date time, because you have a time element. You end up looking for the end of a date time value. The N, for example, being 12th of September, 2019, at 23:59.999. Anyway, so that's the between operator. Here we have brackets in the operator, not between the operator. And this is one area I think people get confused.
SQL is all about set theory. In set theory one might talk about something being in a set. So, I suspect that's where some of the confusion lies. We don't say that's a set of crockery in English. What we have here in this list is Tom, Dick, Harry. This is a set of possible values. That's the in operator.
Now, in some systems it would be subject to case sensitivities. A database will either compare character values using a case sensitive or case insensitive capacity. We could choose to add uppercase T Tom, uppercase D Dick, and uppercase H Harry. Also all in uppercase or in lowercase, which would cover most of the variations that there might be. But ultimately, if you want to do a case insensitive comparison on Oracle, for example, then I would suggest that you use lowercase or certainly the upper function. If you compare, it's easier than coding a series of conditions to comply with all.
Okay, so we also have the 'not in' function. Use the word 'not in' between variables. So, you can use values that will apply outside of the range.
Now, unfortunately, given the nature of our data in this demo, if you are not in London or Surrey, clearly you have to be in Hampshire. The next operator we have is the like operator. To describe the like operator, I would be using the word pattern. I'd likely also be using the wildcard to describe the role of the percent symbol here. The percent symbol is a wildcard representing absolutely anything at all, including nothing. Now, in practical terms, what this means is it's going to look for a value where the first character is an E. And again, because this is Microsoft SQL Server, it can be an upper or lowercase E. If I switch this to an uppercase I, that's those who contain an uppercase I. There's also an underscore wildcard character and that matches exactly one character. It could be any character but it's exactly one. With people's names, looking for those people who have I as their third letter in their name. You can also start a name with literal characters. So, that's anything that begins with D and ends with an R. So, you need to use the like operator for that range.
You can't change it into an equality operator. What I'm now asking SQL to do is retrieve those people in the salesperson table whose last name is literally there. You can also do 'not like'. It's useful where you have some kind of description column and you're looking for keywords within that description for something like, we'd look for the word urgent in a table or something like that. Here you have a search that starts with an A, ends with an A, and contains an A. In this one, T is the second letter and after the T, that must be followed by an N, and that N must be followed by an R. And the R must be the penultimate character.
Okay, so that's the like operator. Let's now look at 'and' and 'or'. The idea here, I want one result set. I want anyone who is in department two. I also want anyone who is in Surrey. Now, my two conditions are the county, where it should be equal to Surrey, and department should be equal to two. However, I want to combine them in a specific way. Now, if I were to describe what I want in English I would say people in Surrey and people in department two.
Now, in terms of the SQL condition if I make the condition county equal Surrey and the department number equals two, all I'm saying is that both these conditions must be true. So, rather than getting both the people in Surrey and the people in department two, I'm asking SQL to find Surrey based salespeople in department two, of which there are none.
So, the rule of thumb is if you combine conditions, you will end up with a more specific result. You're narrowing your search criteria and you will likely get to your rows which actually in this case is by using the or. It's like the opposite as it were. Either condition can be true but they can't both be true. What 'or' does is allow us to be more broad in our criteria. And so I'm likely to get more results back. I want to add an additional little test with the sales target to be at least double digit as part of my search parameters. I'm expecting that we have county and sales target equals nine. I can see that the target is nine, although literally being nine. The issue here is that the row is in the target county Surrey and this has been interpreted as either that is true or that is true. The reason for that is that there is a precedence rule. 'And' has a higher precedence than 'or'. So it's as if I had parenthesis around it like this. So, either this is true or those two things are true. You put the parenthesis like 'and' and 'or'.
Here, we are looking for anyone in Surrey or whose name is Smith, and then this is talking about multiple conditions. It says which employees are in Surrey. Well, it's going to be any employee in Surrey because it's counting from Surrey, so which Smiths live in Harlow? You can use parenthesis. If you put parenthesis then that's going to bring back anyone called Smith who is either based in Harlow or lives somewhere in Surrey. The important point is 'and' has a higher precedence than 'or'. So, you may need to put parenthesis around conditions if they are combined with 'or'.
Okay, so that brings us to the end of basic data manipulation. Please close this lecture and I'll see you in the next one.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.