The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

SQL null vs Default values

Title says it all really, are there any good reasons not use default values and use null instead?
We can think of a few, but they're really special cases where null implies use the application default.
One we've thought of is a credit limit, where null means to use the system wide value, rather than having had a specific value set. Other than that we're having a hard time to think of a good case for them.

I'm sure I've seen a good article on this somewhere I just can't remember where.
Peter Ibbotson
Wednesday, November 24, 2004
 
 
Null should be used when "lack of a value" is a valid option.  In your example (use the system value) you should probably use a default value in the database.

Imagine the following two scenarios:
A customer fills out an info sheet: Address 1, Address 2, City, State, Zip.  They fail to fill out Address 1 or Address 2.  Both should be NULL.  In this case Address 2 is allowed to be NULL as there may not be an Address 2. Address 1 is NULL and that identifies a potential data problem.

Now you need to set up a credit card for this customer.  The paperwork is filled out but no Credit Limit is identified.  Company policy is to have a default value of 1,000 USD as the credit limit.  That's a perfect time to use a default.

My database is full of NULLs, as it is a perfectly acceptable value for many of my results and I am very judicious with my default values (they can cause trouble when used incorrectly or if usage changes over time).
Lou Send private email
Wednesday, November 24, 2004
 
 
I remember reading in a book that NULL values were best avoided.
Not because of inherent problems with them, but because few people really know the results of arithmetic and aggregation operations on NULL values.
I don't really follow this piece of advice though.
Of course, I don't remember the title of the book.
Pakter
Wednesday, November 24, 2004
 
 
I wouldn't call it a Default, to me a default is a sensible initial value (e.g. $0.00, or blank Middle Name). Replacing NULL with some kind of special dummy value means you need to pick a fake value that you know is a special case (like -$999999.99)

I've used both and really don't see much difference. You need to handle the special case of [NULL|Dummy Value] either way. Once you get used to NULL it's not a problem.
Tom H
Wednesday, November 24, 2004
 
 
Theoretically, you shouldn't have NULLs at all:  If a lack of a value is an option, then that's signified by a lack of a corresponding record in a related table, which is a true lack of a value.

In practice, the only thing you need to worry about is avoiding default values that aren't truly default.  In the credit example above, if the default credit limit is raised from $1,000 to $2,000, does that mean that all the existing limits in the table that are $1,000 should be raised?  You don't know, just by looking at the data, so using NULL to signify "system-wide default" is appropriate.
Justin Johnson Send private email
Wednesday, November 24, 2004
 
 
Yup, it can be more of a philosophical discussion.  A 'Hair_Color' field can be blond, brown, black -- but what if the person is bald?  What if you 'don't know yet'?

Using a 'default value' to indicate 'don't know yet' is risky -- because it can hide that state.  Also, lots of my DB administrator *LOVE* to put 'non-nullable' on *ALL* my fields.  This means I cannot 'insert' a record with only known values -- I must set *ALL* values before the insert will succeed.  I hate that.  It is really hard to maintain code that requires that.

So, if you can safely assign a value which means 'don't know yet', then you don't have to use NULL for that purpose.  Or, if you have the  situation where you ALWAYS know ALL the values, then you don't have to use NULL.

If you are database interfacing in VC++ or VB, I believe the NULL values are 'promoted' to the zero value or empty strings in a recordset -- but you can still use NULL in a query.
AllanL5
Wednesday, November 24, 2004
 
 
Interesting, so the answer is a genuine it depends :)
Null certainly has some nasty side effects if you're not careful and we were thinking of using it some cases, but for most of the stuff we do default values are probably better.
I just seem to have a memory of someone doing an article explaining that default values were a bad idea and null was better.
Peter Ibbotson Send private email
Wednesday, November 24, 2004
 
 
By the way, NULL definition is not very well, difference DBMS have difference interpretation and difference behaviour of query about that. Thus if you develop a software aim to support multiple DBMS you probably like to avoid NULL as much as possible
Carfield Yim Send private email
Wednesday, November 24, 2004
 
 
If you're working with these  values in Perl be careful.  The way you compare things equal to 0 or null or other things depends on what you're comparing and how you're comparing it (compared to C or C++ here it's more straightfoward).  Perl people know what I mean (hopefully).
Alternative Truth
Wednesday, November 24, 2004
 
 
I would say that nulls and defaults are orthogonal.

Both can cause problems when used inappropriately, although both are useful sometimes.

It is true that some examples allow you to select defaults rather than use nulls, but this is not the case always.

I despair for anyone working with DBAs who insist that the use of nulls is forbidden, setting all fields mandatory. It is equally valid to make all fields optional. That is, neither option is correct, in most cases.
Nemesis [µISV] Send private email
Thursday, November 25, 2004
 
 
The article that you might be thinking of is:
How to Handle Missing Information without Using Nulls (http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf)
Which part of the fine website http://www.thethirdmanifesto.com/
ITECS
Thursday, November 25, 2004
 
 
Those guys are like a walking billboard for 'worse is better', as an example of theoretical correctness gone overboard.  Almost everything they preach can be achieved in current RDBMS with careful design, and almost all of it is unnecessary and performance impairing.
Justin Johnson Send private email
Thursday, November 25, 2004
 
 
I think the point is that in a well-designed DBMS it wouldn't be performance-impairing, and would be supported by the language in a way that would make it less awkward than it is in SQL.
Matt Send private email
Friday, November 26, 2004
 
 
"Imagine the following two scenarios:
A customer fills out an info sheet: Address 1, Address 2, City, State, Zip.  They fail to fill out Address 1 or Address 2.  Both should be NULL.  In this case Address 2 is allowed to be NULL as there may not be an Address 2. Address 1 is NULL and that identifies a potential data problem."

I don't know why people keep using address relations as examples when discussing database problems. They are the worst example of a 'real world' information need that is nearly always denormalised (I do too).

"My database is full of NULLs, as it is a perfectly acceptable value"

Heeelllooo?
Database pedant
Saturday, November 27, 2004
 
 
"Those guys are like a walking billboard for 'worse is better', as an example of theoretical correctness gone overboard.  Almost everything they preach can be achieved in current RDBMS with careful design, and almost all of it is unnecessary and performance impairing."

Really? Whenever I hear somebody talking about "theoretical correctness going overboard" or a similar phrase I imagine a really sloppy developer who can't be bothered to apply intellectual rigour to their high level design.

Yes, good RDBMS can get rid of a lot of the problems of nulls. I'd be interested to know how you do that.
Database pedant
Saturday, November 27, 2004
 
 
SQL NULL is probably best understood as two different things combined: the null value, and the third case in three value logic (i.e., TRUE, FALSE, UNKNOWN).

The major problem with SQL NULL is in this second aspect as part of the three value logic. Most simply: three-value logic is weird, and SQL doesn't make it easy to figure out when you've made terribly wrong assumptions in your three-value logic statements.

The null value is a literal value that you interpret to mean "null" or "unknown" WITHIN two-value logic. If NULL weren't also the "unknown" of the three-value logic, you could safely use it as a null value in SQL. But, you can't.

So, as an example, imagine a table with a GENDER column, which can contain the values 'male', 'female', 'null' (null value, not NULL). These can be evaluated using statements like:

gender != 'male'
gender != 'female'
gender != 'null' --NOTE: null value, not NULL

all of these statements are evaluated under two-value logic (they are either true or false).

If the column instead could include the SQL NULL, you'd need to evaluate them using three-value logic:

gender != 'male'
gender != 'female'
gender is NOT NULL 

A question one might ask then: with either of the first two statements, are the rows set to NULL returned? In the null value example, you can clearly answer "yes", because:

'null' != 'male' and 'null' != 'female' are both TRUE.

But, in the SQL NULL example, it's less intuitive what will happen, as it's the three-value logical evaluation of:

'male' != NULL and 'female' != NULL, are both UNKNOWN.

This means that SQL databases follow some non-intuitive rules as to when they'll return rows that evaluate to UNKNOWN. There is a real logic there, and you can look it up and find out for sure--and most databases will implement the rules correctly most of the time.

But, in my experience, in the long run (e.g., as things get more complex), it's easier to get queries right if NULLs are considered harmful.
Jay Fienberg Send private email
Saturday, November 27, 2004
 
 
"Really? Whenever I hear somebody talking about 'theoretical correctness going overboard' or a similar phrase I imagine a really sloppy developer who can't be bothered to apply intellectual rigour to their high level design."

I'm not someone who scorns academic rigor or the input of theoreticians.  But in the paper linked above, a three field record was normalized into *five* tables; single fields were split into two tables just to handle different possible unknown values.  In any real-word environment, that's just pointless and confusing.

"Yes, good RDBMS can get rid of a lot of the problems of nulls. I'd be interested to know how you do that."

I use NULLs judiciously along with normalization to find the right balance of clarity and performance (both in the RDBMS and in coding).
Justin Johnson Send private email
Sunday, November 28, 2004
 
 
"I'm not someone who scorns academic rigor or the input of theoreticians.  But in the paper linked above, a three field record was normalized into *five* tables; single fields were split into two tables just to handle different possible unknown values.  In any real-word environment, that's just pointless and confusing."

That's just a question of perspective. I occasionally explain to clients the basics of ER modelling and Relational Design. If I think it helps the development process. Universally the response is something like, 'Oh boy, that's complicated, you must be really clever to understand all that'. Things which are actually very simple and obvious to us, like intersection relations for many to many joins between entities, seem complex to them. But we do it (almost!) without thinking, yes? So just because Darwen has come up with a structure that fits the relational model AND eliminates the problem of NULL meaning 2 things, but that structure appears complex, doesn't mean that that structure isn't perfectly useful. Just that we aren't used to it. Yet!

I have doubts, but I think it's a very good attempt. Mainly I have doubts because it involves mandatory participation at the 'base table' end. And that's difficult to enforce in most RDBMS's with declarative referential integrity.

"I use NULLs judiciously along with normalization to find the right balance of clarity and performance (both in the RDBMS and in coding)."

Good for you. You mean that you do it 'the way I think is best'. Which I suppose is all that we can do.

NULLs are a problem. The more I look at it the bigger the problem. They force you into all sorts of horrible outer joins. Worst of all they aren't scalar. It's a non value that actually has 2 values. I think any attempt to eliminate their use is to be supported and seriously considered.
Database Pendant
Sunday, November 28, 2004
 
 
I agree in principle that NULLs are generally a bad thing, but I think they're a justifiable reaction to real-world problems that theoreticians like Date just don't quite grasp.  The fundamental flaw in the linked example is the attempt to hold different meanings of the null value and articulate them--in other words, to force meaning into a lack of information by putting a different constraint on the field than simply marking it as required.

Theoretically, there's an infinite number of possible meanings for NULL in a salary column: the linked example showed unsalaried and unemployed as two, and created tables for each.  But what about someone who takes stock for salary?  Or goods?  Do we create a third table for non-monetary compensation?  All that's been done is to push the constraint on the field up a level, to the table design rather than the field itself.  And in practice, that much information won't be available anyway: in the linked example, what's put when the person entering the data simply doesn't know?  A fourth table?

One of the most obvious things that struck me when I was actually designing tables for the manufacturer where I worked was how little information I could actually depend on being there when it was supposed to be there.  That's a business thing, a human thing that can't be avoided with all the cleverness and theoretical correctness in the world.  How do you deal with that?  You allow NULLs where you have to, and you code in light of that fact.
Justin Johnson Send private email
Sunday, November 28, 2004
 
 
"Theoretically, there's an infinite number of possible meanings for NULL in a salary column: the linked example showed unsalaried and unemployed as two, and created tables for each.  But what about someone who takes stock for salary?  Or goods?  Do we create a third table for non-monetary compensation?"

That looks to me like stretching this example to prove your point. First of all a salary is just that. It isn't the same as payment. It's a data modelling problem, and not germaine to the NULL problem. Off top of my head there would be some other table like Renumeration Method, with values like 'Salary', 'Stock', 'Ice Cream out of the freezer' and such like.

I don't think NULL is capabable of 'having' infinite 'meanings'. The example you give says that the column is mis-named, and that the analysis is incorrect. If an attribute value is allowed to be null, seems to be there are 3 options:

1. value known and it's this (we know their salary and it's $30,000).

2. NULL in the column. Which means we don't know what their salary is.

3. NULL in the column. Which means they don't have a salary.

4. NULL in the colunn. Which means we don't know if they have a salary or not, or what it might be if they do have one. That's the point.

"One of the most obvious things that struck me when I was actually designing tables for the manufacturer where I worked was how little information I could actually depend on being there when it was supposed to be there.  That's a business thing, a human thing that can't be avoided with all the cleverness and theoretical correctness in the world."

Then the business changes it's method of working. Sorry. I have to say that all the time. To businesses. "We've been doing it this way for blah blah blah." Sure you have. Now there's this new technology called computer based information management and it's based on this stuff called relational database theory which might not LOOK like technology but it is and it's (relatively) new and if you want to take advantage of it then you'll have to change how you handle information in your business. And if you don't change your approach to information in your business then you can't use this shiny database stuff so stick with how you were doing it before.

I agree. For all that I've said I get NULL all over the place. And yes, you frequently don't have every single datum to enter into a record right at the begining. BUT very often it's nothing more than the business just can't be arsed to collect it, because they've got a sloppy attitude to information and that's why they're in a mess expecting you, Mr Database Developer, to pull them out of it.
Database pedant
Sunday, November 28, 2004
 
 
I know your pain intimately, because I fought the exact same battles:  "Look, if we don't make the salespersons enter the number when they submit the form, it's never going to get in there as a reliable figure for the projections."  But fundamentally, there's a limit to how much a business can/will change to take advantage of what IT can do for them.  The other side of the coin is that, to the extent that you force a business to use a system correctly, the more opportunities you create for people to game the system, another constant battle I fought.

It's perhaps for that reason that I've lost patience with the far end of theoretical correctness.  It doesn't matter how perfectly it's designed if it doesn't get used.  A canny IT professional can do a lot under the hood to make things both correct and workable, but perfection itself is just too expensive a target.
Justin Johnson Send private email
Sunday, November 28, 2004
 
 
Date of Birth.

What would your default be?

Philo
Philo Send private email
Sunday, November 28, 2004
 
 
Philo, depends if you were modelling stones, right?
i like i
Monday, November 29, 2004
 
 
?
I don't get it.

Philo
Philo Send private email
Monday, November 29, 2004
 
 
Date Of Birth versus 'Stones' --

If you had a database, where you were storing data about rocks (Stones) -- then DateOfBirth would not be something you know immediately.  It might be 'too large' or it might be a non-sensical question, or a 'don't-care' kind of question.
AllanL5
Monday, November 29, 2004
 
 
Even for a human's DOB, it might not be known or available at the time the record is initially created.  If you make it a required field, I guarantee that you'll have people entering nonsense DOBs to bypass the field.
Justin Johnson Send private email
Monday, November 29, 2004
 
 
So here's a new question.  In a varchar/text/etc. field, do you consider there to be any difference between a NULL and an empty string?  If so, what?
Kyralessa Send private email
Monday, November 29, 2004
 
 
Ideally, when using default null values for 'unknown', 'n/a', or 'not represented', each domain (i.e., datatype) would have built-in values that have no meaning within the domain.

So, if you had an integer column, you would have default values like 'A' or '*' that couldn't possibly be thought of or processed as being integers. (SQL NULL would work this way, except other weird stuff happens because of SQL's three-value logic--see my previous comment.)

With SQL databases, almost any value you use that is within the range of the datatype could get you in trouble (i.e., someone or some program could treat it like a "real" non-null value).

I used to use the Sybase ASA database that supported dates down to 0001-01-01, and used to use 0001-01-01 to mean 'unknown'. Although this could cause a problem because it can be processed as a valid date (and it shouldn't be), it at least isn't an actual date (there never was an actual day, January 1, 0001, because the Western calendar wasn't created that early).

Anyway, if your database DATE datatype supports 0001-01-01, it's a date that could be used as a default null value in the sense that no "real date" in your database could be that date (unless your using a make-believe calendar or something...).

P.S.: I've used as many as three different default null values for each type: one each for: unknown, n/a, and not represented. It's overkill usually, but it can be done and it can be useful in terms of being able to mark the difference between unknown and n/a (which Codd specified in the Relational Model, though he also specified a five-value logic which I wouldn't recommend even if it were easily do-able).
Jay Fienberg Send private email
Tuesday, November 30, 2004
 
 
I think the discussion on this thread really emphasises the truth of the academic arguments: NULLS are a problem, and we would be better of without them.

The problem with NULLS is that they get overloaded.  It is very rare to have just one 'NO DATA HERE' situation.  Usually there at least two, as Justin points out:

  1) The data is not entered yet
  2) The data is not available, and is not required.

Justins example of Address1 and Address2 sum this up nicely.

The problem is that the discussion around NULLs always seems to fall into a 'read world' vs 'academic' rather then consider real solutions to real problems.

I think one practical compromise is to have a ADDRESS_REQUIRED field.  When a record is created and it's address is set to NULL the address is linked by the ADDRESS_REQUIRED field.

Then whenever you encounter a NULL you check to see if the record appears in ADDRESS_REQUIRED.  If it is, you need to insist that it is entered.

In the inerface you can provide an option for selecting 'NO ADDRESS' or something like that.  Selecting this option removes the entry in the ADDRESS_REQUIRED table.

Talking about NULLs in general, one advantage of using a NULL value rather than a default value like 0 is that any operation on a null returns a null.

For example, if you are summing some values that should contain only proper values, a default value of 0 will just mean that that item is not summed.

For example:

  Parts:  50
  Labour:  0 (Meaning Not entered)
  ----------
  Total:  50

The customer quickly pays up and runs off before you notice.

If you use nulls you get this:

  Parts:    50
  Labour: NULL
  ------------
  Total:  NULL

If you are using a certain value, like -9999, as a default value then you can get some really strange results.

  Parts:    50
  Labour: -9999
  -------------
  Total:  -9949

The customer accepts his refund and scarpers quickly :)

When nulls are expected, then you have to filter them out in your sum operation otherwise you will get NULL results.  The benefit is that if you fail to take a NULL into account then it will alway obviously fail by default.  In contrast to this, imagine you're summing thousands of transactions and somebody fails to filter out the -9999s.  It could go unnoticed for some time.

So nulls do server a purpose, albeit imperfectly.  I look forward to the day our DMBSs offer a better solution.
Ged Byrne Send private email
Tuesday, November 30, 2004
 
 
Database Pedant,

I remember once I was waiting for the bus next to Radio Shack (we call it Tanys over here, or at least used to) and my Walkman needed batteries.

I had a few minutes so I popped in to buy a couple of double A's.  I saw the bus coming, so I thrust my money at the shop keeper and told him to keep the change.

He refused to take it.  Instead he asked for my address.  He told me that in order to complete the transaction, he needed to know my address.

So I left the batteries and caught my bus.

I see other examples of this all the time.  Like having to go through the same old questions every time I ring a call centere to get help or buy something.  It makes me avoid calling again.

Companies have something called customers who are far more important than even the shiniest of new technologies. 

Thankfully there are those who are pragmattic enough to hammer the technology into something that resembles what the customer needs.
Ged Byrne Send private email
Tuesday, November 30, 2004
 
 
In my previous post, I mis-typed:

. . . Codd specified in the Relational Model . . . a five-value logic . . .

This should have been: FOUR-value logic. Codd's four logical states / values are:

true
false
unknown
not applicable
Jay Fienberg Send private email
Tuesday, November 30, 2004
 
 
If its a payroll application and date of birth isn't available when you create their payroll record (you wouldn't not pay someone because you hadn't got their date of birth would you?), you can't use NULL if you're going to calculate their pay (not in the UK tax system anyhow).

So, you have a default date of birth if they don't enter anything, this is generally around 1955 unless you know they're under 18.  This means that in creating the record you don't put a default date in, but that you use a default date for processing when the date is empty.  Empty does not imply NULL. 

I reserve NULL for joins where the resultant data is unknown, ie a Salesman that has no sales of red cars in a join of all Salesman and all sales.
Simon Lucy Send private email
Tuesday, November 30, 2004
 
 
"Companies have something called customers who are far more important than even the shiniest of new technologies."

Companies have something called profits. Which are far more important than customers. If those profits depend upon effective data management then that takes precedence.

A bodge is a bodge. Plain and simple. Your charming story about Tandy's is all very well. Perhaps you could see it from the point of view of the company auditors, who wait to get prosecuted for false accounting if they can't defend their accounting practices, including not allowing sales assistants to pocket change. It also makes my general point very clearly. Tandys is still in business. (Very successful business, in my other life as a musician I've played at Prof Rolf's house, which is a stately pile in northern England).

The fact that you see this sort of thing frequently, and like to present it as some sort of ugly insistence on correctness, merely reinforces the simple truth that many businesses who care about accurate information survive. Perhaps the thousands of companies who didn't care didn't survive.

Yes, your story about having to give your details over again. Frustrating isn't it? My bank has superb telephone systems which don't do any of that. HSBC, maybe you should change.
Database pedant
Wednesday, December 01, 2004
 
 
"If its a payroll application and date of birth isn't available when you create their payroll record (you wouldn't not pay someone because you hadn't got their date of birth would you?), "

Yes. If employees can't be bothered to give me the personal details I require then sure as hell I'm not going to pay them.

I really don't see why so many of you have such a problem with all this. In many information systems, whether implemented as a relational database or not, the information is incomplete to the point that it doesn't make sense or is useless to the organisation, if certain key data are not present. If the organisation has decided that they _must_ have datum x then that's it.

The problem with pursuing the line so many here are is that eventually you end up with useless databases. Which have to be, guess what, re-written from scratch, properly. It confirms my opinion of programmers. That they are usually more interested in code that data.
Database pedant
Wednesday, December 01, 2004
 
 
The fact that companies can continue to prosper even though they sacrifice customer service for the convenience of their internal procedures shows that there is something very wrong, and not just software.

Thankfully Tandy's dropped the rediculous data capture scheme quite quickly.  I'll bet it took a dip in profits to make them see sense.

These same battle lines are always being drawn.  The sloppy bodger against the compentent professional.  The practical pragmatist against the impractical pedantist.  These are not our choices.

A decent database would be able to handle these real world conditions.  With some additional effort skilled professionals are able to overcome the failings in our current tools.
Ged Byrne Send private email
Wednesday, December 01, 2004
 
 
Ged,

The Tandy story really has nothing to do with a proper RDBMS or nulls in particular. Although it illustrates exactly the wrong type of attitude a retailer should have toward customers there’s no condemnation, or validation, of nulls there.
Dr. Mario Send private email
Wednesday, December 01, 2004
 
 
Dr Mario,

My first post was regarding NULLs.

My second post, containing the tandy story was a response to Database Pedants comment about businesses just having to adapt their methods to suit the database.

My point was that this attitude leads to dreadful customer service.
Ged Byrne Send private email
Wednesday, December 01, 2004
 
 
Pedant,

Your advise to switch to HSBC, and the success of First Direct, also illustrates that satisfying customer service needs is important.

Hopefully they've also improved their databases, I left First Direct several years ago becauase the balance and statements were only updated overnight.

Good customer service or technology alone may be enough to survive, but getting both right gives a competitive advantage.
Ged Byrne Send private email
Wednesday, December 01, 2004
 
 
It's not that uncommon to not have complete information on a starting employee, other than their name and when they're starting.  The chances are you'll have things like their National Insurance number, their P45 before the payroll gets run but there are rules there for when you don't and its part of the art of writing payroll software to implement them.
Simon Lucy Send private email
Wednesday, December 01, 2004
 
 
Ged,

My apologies, then.
Dr. Mario Send private email
Wednesday, December 01, 2004
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz