SQL Master Class
The course is part of this learning path
In this course, we go beyond the basics to empower you with the skills, how-to knowledge, and hands-on expertise to work through and solve complex business issues with SQL. We go deep and hands-on with transactions, subqueries, inline views, summarized queries, the exists predicate and set operations.
This course is built from content created and delivered by Ian Wallington of QA.
- [Instructor] Hello and welcome back! Let's discover the wonderful world of summarized queries. What are they and how you might use them. So we are going to cover the sum, max, min, avg, or average, and count functions. So what we have here is a dataset. The sum and average functions will only work with numbers. Min and max work very well with dates. Min will give you the oldest date, max will give you the latest date. The min and max work with character data. So you might want to see which of your customers will come last in an alphabetical list, for example. So those are the aggregate functions. Now, this is the information that nulls will ignore. So every row or every instance of a column in a table has a null in it, or if it didn't have any real data, and you summed or averaged or counted or minimumed or maxed on those, you will get a null. But if any of those aggregate functions is given a piece of none, i.e. not null, which is known information, it will work with that. The next function we have is count. Now, count gives you four with this dataset and count star gives us five. This is the total value of all of our orders and the largest order placed from the sales table. Now, at the moment those figures are for the entire table. I could put a where clause in here to say, give me all those where the employee number is equal to this. At the moment that's all based on those being single figures for each of those rows. So imagine that actually what you want is a figure for each company. So what I want to do is I want to look for company number 1000 and say, okay, we have two rows here. The sum is therefore 12 and the maximum is seven. Then I want to look at something, say, company 2000, and here the sum is 15 and the maximum is 12. And then company 3000, six, 33, 36. It's the third group and 27 is the largest. Then our last one is company 4000, which is easy, because there is only one row. Therefore the sum and the maximum will both be two. So what if I want to do this group by on company number.
Group by company number and the numbers returned are 12, seven, 15, 12, 36, 27, two, and two. Now, each of these rows is for a specific company because I'm grouping by company number. There is therefore one single value for the company number that I can get, including results here. So what I'm going to do is once I'm grouping by this, I can include this in the select this statement. What you can't do with this sort of query is say, get me the largest order value and the name of the company that the order was bought from. What you get is, in this case, an error message. So in this we are grouping. You can't include ordinary column expressions in your select this if you have aggregate functions. What it's attempting to do here is aggregate all of the orders to find the largest. And then you're asking, could I have the company number as well please? Well, that can't be done. There isn't a company number necessarily. It could be many companies that place the 27 order total. You can find out however. You can find the name of the company or the numbers of the companies with the largest order and to do this, we'll need to use a sub-query. The error message is saying we have something invalid in the select list because it is not contained in either an aggregate function or in the group by clause. Now, you will also get that message if you try doing something like including the company name in that select. We still want the company name in here as well, however. So what I'm going to have to do is a join on company.
Even though there is a company number column on the sale table, I've gone for the company number column from the company table, which means that I specifically have to group by the same column. The tables then need to be appropriately aliased. And then I'm trying to focus on the aggregate function grouping. Then I can include company name. Or can I? Now it's saying contained in status because it's not contained in either an aggregate function or a group by clause. Now, if you want to get lateral about it you could actually pass it to a minimum-max function call because there's only one value per row. Therefore, the largest or smallest value will be right. The simplest thing is to put it in a group by clause. There's only one name for each company. So this might seem like a little unnecessary but it is because Microsoft SQL Server and Oracle are a lot more rigorous and therefore I would suggest a lot safer in how you do these collections. I could also group by another column. I'm now going to get it to group on combinations of company and salesperson. So I will get more rows. In the majority of cases a given salesperson has only made one sale to a given company. Our count here is eight. So that's aggregate functions and the group by calls. Here's a grouping query, grouping sales purely by salesperson. And again, here we have a set of data, and the figures it will result in. There's only one person who made more than one sale to a company. Ideally what I'd like to be able to do is to use that column there as a select criteria. I want to be able to use this column here, the number of sales as a selection criterion as a condition. I'm going to put it in a predicate. That's where you put conditions.
The issue with the where clause is that it gets used to determine what values get past two aggregate functions. So it's applied before the aggregate function result is known. You cannot therefore test an aggregate function result in a where clause. You need a second level of filtering. And the keyword that's been chosen for this is having. The where clause is having to work out what the result will be. So it uses the where clause to work out what dataset to pass to the aggregate function. The aggregate function then gives you a result, which you can then test in the having source. At some point I had select count from salesperson and that gave me seven. Then I put where county equals Surrey, and then that changed the value of the aggregate function as well. The other point about the having clause is it's probably best used with group by clauses because without a group by clause, a function with an aggregate function is only ever going to return one row anyway. Here I'm going back to simply grouping by company and company name, and then I have chosen to say, let's now find those where the total is 10 or more, or above nine. And I get the maximum under 20. It's the second level of filtering. I could do from sale join company where co.name like %i%. And therefore this filters on one of those. That is the having clause. If you want to do filtering on the results of aggregate functions, the having clause is a really good operator to use. Aliases might be very tempting but in Microsoft SQL Server and Oracle you have to use the aggregate functions yourself. If you are using MySQL, you can use those column aliases in the having clause. This is a summary of the clauses in the select statement. I would say that if the select clause isn't mandatory in the select statement, then what would be? The from clause. The SQL standard says that it's mandatory. But you can get away without it in Microsoft SQL Server or in MySQL, if you don't refer to the table, to columns in tables or views. But for any meaningful query against databases, yes, you have to have it. Join. Yes, if you are querying for more than one table, technically it isn't mandatory, but it's a good idea not to use a comma-separated list of tables and conditions in the where clause. The where clause is always an option and one that you will probably frequently use because generally you are asking questions about your data, which of these has these characteristics, attributes, et cetera. The order by clause is in Microsoft SQL Server and has been in MySQL for a long time. The order by clause at the bottom here. MySQL also has a limit clause, which goes after the order by clause, and you can tell it to return only 10 rows or whatever number you want. Okay, so that brings us to the end of our Summarized Queries section. Close this lecture and I'll see you in the next one.
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.