Working with Nulls
Start course

In this Course, we first learn how to use SQL to alter and delete tables and how to use primary and foreign keys. We examine common joins between tables in a relational database. We then delve into inner, outer, left and right joins before learning to manipulate data using the INSERT, UPDATE and DELETE functions.

This Course is built from content created and delivered by Ian Wallington of QA.


The next part we'll do is the subject of the NULL value or non-value. That is by the way a keyword and it represents a case where a particular row, or a particular column or even a particular field, has a value that is either completely inappropriate, not applicable, or simply isn't yet known. Because every row in a table has to have something for every column, that's how a table is defined. 

You may have a case where a particular row has a value that you don't know yet, or it isn't appropriate, or there is just simply no value for that row. A NULL is not meant to be representative of a zero or an empty string. It is sometimes or often called a string with no characters in it, or a blank, or can be called anything like that. It's a special keyword. 

How you get a NULL into a table in the first place is the first question we need to ask. Let's try inserting a NULL into our department or dept table. Well, dept is the name of the table, and here I have a list of column names. Those are columns that exist within the dept table and here are the values that I wish to insert these are the values clause. Which is part of the INSERT statement. And there we have a comma-separated list of values. Now, department number is going to be set to 11. The department name is set to HR, etc. 

Now, if I were to go back to the diagram, our data diagram, the department table does have these three columns. But, it also have a target sales column. Now, here I'm not specifying a sales target in my INSERT statement. 

It is possible to set a default value for a column or field in a table, but in our data diagram or data design we haven't set any defaults. Therefore, when I run this INSERT statement, and a row is added, one row is affected, and the value chosen for sales target for the HR department is indeed NULL. Now, that's one way of ending up with a NULL value in a column. By not providing a value for a column that doesn't have a default when inserting a row, another one here like their postcode column. Some records do not have a postcode. And, I would suggest in this case, we simply don't know what their postcode is. And, that's a not known NULL. It's the same as saying NULL. So, that's how we can insert NULLs. 

Next thing is there's interesting behavior that happens when we include them in expressions. The technical term for what I'm about to demonstrate, is the propagation of NULLs. Now, NULLs propagate through expressions. However, on a more basic level if you have an expression that includes a term that isn't known, some unknown value multiplied by 1.2 for example, it's going to come up with a value, it's going to be six or it's going to be 18, and then it's gonna become 30. Now, this is going to end up still NULL. I can multiply by 1.2 and I can add 10 to it. I can keep carrying out operations, and it will always come back as NULL. So, that's NULL propagating. 

The next thing NULLs behave somewhat weirdly is when they come up in WHERE clauses. If I just highlight these first two clauses here, then that will bring back all the rows from the salesperson table. If I run this with the WHERE clause, it's going to bring back those with a postcode starting with the letter R. I have six rows in total. I have two where the postcode starts with an R. Now, if I change this to a NOT LIKE just to test it basically. If there are six in total and two of them are brought back when I look for those particular postcodes, those starting with R, then surely it is going to return the other four, is it not? Well, what happens is, it's going to look at that and say, "This isn't R, this isn't R, oh, that's not an R." It gets to this one it goes, "Oh, I don't know." So, it pretends it doesn't exist. So, in the end, this only returns the two rows where the condition is known to be true. 

In SQL, it said that you have a three-way logic, which is not actually normal. In programming languages, things are either true or false. In SQL, there can be true, false or null. And it's only if the condition is true, that the row gets returned and the results. 

Now, there is a test for NULLs. For example, where postcode is equal to NULL, postcode is equal, you'll get nothing back from that. Where it's null, it will return those values that are NULL. There's also IS NOT NULL. In terms of my original condition, where I was looking for postcodes that didn't start with the letter R, I might choose to add this to the condition. I can choose to say, "Look, if it's NULL then treat it as one of those, "i.e. not beginning with R." 

So, we will have this NULL for not applicable or not knowing. Now let's put a SELECT statement. On INSERT, you must provide values for mandatory columns. You can say when you find a table that you must provide a value for a column, it's mandatory or not NULL. But anything that could have a result or doesn't have a default, as in the sales target, you'll end up with a NULL in the table. It propagates through these expressions, and the only one that returns an expression that you can't say what the value of expression is overall. With the clause you have is true, false or null as options.

When you get back those rows in those queries, there is still WHERE IS NULL operator at the bottom here. Okay, let's talk about COALESCE. The COALESCE function is one that often causes many people trouble. I think it starts with the fact that the name, the very word "coalesce" is not something we use every day. That word means to bring together as one. The other issue is that it's spelt C-O-A-L-E-S-C-E, and it is quite a complex function as well. 

So, its job is quite simple. It is to return a known value. You can give it an argument list, and here I'm going to give it two arguments. The first one I'm giving it postcodes as the first argument and for each row, where the postcode is known, COALESCE will return that, and it will therefore appear in the results list. However, for these fourth and sixth rows, the postcode is NULL. 

So, what I'll do is I'll give the second argument which is a string or character literal, and effectively what's going to happen here, is that for each NULL postcode, COALESCE is going to return that string of characters. So, in this case, it's being used as a simple EITHER OR. Now, this will be a calculation so the column name disappears. I think I'll use an alias as post_code. If this is known, return it, otherwise return that. Let's look at the HR table. I could also use it in here, where I have this calculation. I could say that the sales target, if NULL, should be replaced with a zero, or coalesced with zero is another term for it. The HR department has a NULL sales target. So, that was coming back as NULL. But, now I'm saying, if the sales target is known and then use it, that function multiplied by 1.2, and add 10 to it. But if that sales target is NULL, then this string here replaces the NULL with a zero. So, that becomes a zero, and that therefore becomes 10. So, you can use it to replace NULLs with a known value. 

Now, the other thing you can do with COALESCE is pass it more than two arguments. Now, here I am using a character string value. And that's effectively the default. Here, I'm using a numeric value because the sales target is a numeric value. The arguments that you pass to COALESCE must be all of the same basic data type. 

The other thing you can do with COALESCE, is pass it more than two arguments. Here, I have some people in the staff table and some phone numbers. What I want is a simple name, and the best number to call them as a type of query. So, during business hours, let's say, because Fred is office based, we probably want to try his office phone number. Whereas if somebody doesn't have an office number, then let's try their mobile number as a first priority. 

So, basically, this is the order of priority as it were, because COALESCE will go through the arguments, with which it has been provided and say, "Is that known," because for Fred, we do have a number so it will simply return that. But for George, the business number is NULL. So, therefore, check the mobile number. If that is known then return that. For Harry, the business number is NULL, the mobile number is NULL, so it checks th,e next. If that's known it returns that value. Eustace, this person here, will bounce through all three, they're all NULL. So, it returns this character string as a default value. The next shows you the data and the results. 

All right, so that's COALESCE. That will work with the majority of database management systems. 

Okay, that brings us to the end of our lecture on NULLS and COALESCE. Close this one, I'll see you in the next one.

About the Author
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.

Covered Topics