The course is part of these learning paths
In this course, we go beyond the basics to empower you with the skills, how-to knowledge, and hands-on expertise to work through and solve complex business issues with SQL. We go deep and hands-on with transactions, subqueries, inline views, summarized queries, the exists predicate and set operations.
This course is built from content created and delivered by Ian Wallington of QA.
- Hello and welcome back. In this lecture, we will learn how to apply and work with subqueries. Subqueries are a way of nesting queries within other queries. With a subquery of this type, it will happen in two steps. The query inside the parenthesis, executes first. The results will be cached and then used in the second stage. This is a feature of a very simple subquery, that could in fact be achieved with a join. Because this is a form of join, cause we're equating department numbers, the subquery mechanism predates joins in the history of the structured query language. You can't have an aggregate function in a WHERE clause, so the best way to achieve this requirement, is using a subquery. Let's envisage our objective as to find the department, or departments with the highest sales targets. Now, the sales target exists in the department table, but if I look for it using a MAX row function, it just returns us the value, when we actually want the department name as well. Now, if we try to add department name to that query and run it, it fails. So, we are not able to see the sales target and the department name without some kind of join or relationship between tables, and ideally we want to get everything from the department table with the sales target as equal to 30. So, every time we need to know that piece of information, we want to run that query and get the department or departments with the largest sales target. Departments, of course, will come and go and the department name may change or even be deleted.
Most relevant to our requirements is that the department sales targets are most likely to be changed or updated on a regular basis. What I need to be able to do is, rather than having a hard-coated value of 30, which could quickly become out of date, I need to be able to run a query built from the latest sales target data. I could do this by writing two select statements, one selecting the sales target, the other selecting departments from department that had that sales target. Now, this isn't good code as it's depending on a variable within a select query. One of the issues here is that you can't have two select clauses in one query. What we need to do is encapsulate that second select statement within the first select statement. We can achieve this by placing the first select statement inside parenthesis. Now that, makes it a subquery, a query within a query. The main select statement is the main query. What if I want to find the department or departments with the smallest sales target? On this occasion there are two departments with a sales target of 10. So, this is a typical use case for a subquery. We want to use an aggregate function in a WHERE clause. We have to have a separate query to bring back the value that we want and then we can use it in the WHERE clause, but let me show you another used case.
I may want to find departments that don't have salespeople. I could do this by using a join within is null statement, but let me show you how we can do this with a subquery. So, I want to get complete rows from the department table, but not all the departments. I want to find specific departments and I want to find those departments that do not have salespeople in them. We could link departments with salespeople with an inner-join and you can use the IN operator. The issue is that we would need to have a hard-coated list of some type. Again, the answer comes back to subquery. For those in departments with salespeople, using the IN operator, I actually want the opposite, yes, that is departments without salespeople, so those are two forms of subquery. This form of query only works if you have a single column primary key, so this will be fine for departments without salespeople, for salespeople who haven't sold or for companies without contacts, but if I want to find contacts we haven't sold too, well, we have an issue with that, in that it needs two columns to identify a contact. For this relatively straightforward subquery, there is a limit in it's ability to only deal with one column. Incidentally, in the other example I have here, not only is it returning one column, but it's also guaranteed to return one row, because it's one aggregate function and that's why I was able to do things like use the equals operator. The equals operator is going to expect precisely one value on the right hand side. The IN operator, on the other hand, is expecting a set of values. You might be concerned seeing department numbers two and three turn up more than once, one option could be to put a distinct keyword in here, you'll end up with a larger dataset, but may give you distinct rows, on the other hand, the server will have to carry out an extra operation. Now, this will all be done via indexes anyway, because there'll be an index on the foreign key, dept_no and the index on the primary key, dept_no on the department table. So, it should be relatively efficient as a query. Okay, that brings us to the end of our subqueries lecture.
Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built 70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+ years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.