The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Null versus 'Empty' values...

So here's a new question.  In a varchar/text/etc. field, do you consider there to be any difference between a NULL and an empty string?  If so, what?

Kyralessa tried to append this to the NULL discussion.  I thought it deserved its own thread.
AllanL5
Monday, November 29, 2004
 
 
Personally, I find C's implementation of a string as a "char *" problematic. For example:

I 'declare' MyStr to be a char *.  Initially, it is a pointer to Null.  I have declared that it is *going to be* a pointer for a string, but MyStr doesn't really 'exist' yet.  If I try to strcpy into it, I'll overwrite code memory.

Okay, so I 'declare' MyStr to be char MyStr[80].  Now I've allocated 80 bytes of memory for it.  Do I need 80 bytes of memory?  More?  Less?  Where is that 80 recorded?  (My head alone, probably.  Maybe in a .H file somewhere, if I #define MyStrLen 80).

So, C has an 'undefined' string, as well as an 'empty' string (one with one location, holding a '\0' character).

So, when SQL returns a recordset, that has a 'text' field, that contains just '\0' -- what does that mean?  (By the way, I believe this is the behavior of VB recordsets, and Perl recordsets)

Typically, I take that to mean there is *no* value there yet.  And I tend to use that as a default string value, since that is the "safest" empty string.  I am aware I am probably confusing a "don't know yet" status with a "there is no answer" status when I do this, though.
AllanL5
Monday, November 29, 2004
 
 
In Oracle (correct me if I'm wrong), you can't store empty string text fields.  They must have a value (I've seen, scary enough, a single space used " "). 

I consider there to be a difference between a NULL and an empty string.  Most of my applications are filled with empty strings but very few NULLs.  Strings are also more commonly empty than other value types because they aren't often calculated on -- they are just entered and displayed.  Also storing an empty string for these cases makes perfect sense -- it's the same representation it gets in the program and in the UI.

I also use zero for empty (autonumber) foreign key columns rather than NULL.  Since autonumber key columns cannot be zero it makes the perfect magic number.
Almost Anonymous Send private email
Monday, November 29, 2004
 
 
"I also use zero for empty (autonumber) foreign key columns rather than NULL.  Since autonumber key columns cannot be zero it makes the perfect magic number."

That sounds like an Jet thing. How can an autonumber be a foreign key?
Database pedant
Monday, November 29, 2004
 
 
You're thinking too low level guys. How the null is implemented in terms of what is stored in a memory location is irrelevant. And personally I don't think a zero length string is much different.

Good, excellent, programmers miss the point. Which is that the current mainstream database technique, relational theory, is about semantics. The meaning of data. You can't normalise a set of relations unless you understand the meaning and function of the data. And this has NOTHING WHATSOEVER to do with how the database is implemented. The problem of null is that it doesn't have unambiguous meaning and so it's not an atomic value.
Database pedant
Monday, November 29, 2004
 
 
"That sounds like an Jet thing. How can an autonumber be a foreign key?"

I meant, for example:

User table
----------
UserID  (autonumber)
FirstName
LastName


Article table
-------------
Title
AuthorID    (UserID -- if no author, then zero)
Body


There is no way for a UserID to be zero because it's an autonumber field.  So AuthorID can safely use zero as a magic value.

Sorry if I didn't explain that well -- it's hard to find the terminology to describe this stuff.
Almost Anonymous Send private email
Monday, November 29, 2004
 
 
Er, what RDBMS is this? Has UserID been posted to the other table as AuthorID as a foreign key?

In which case you've lost your referential integrity, if you can get zeros in there. A better idea might be to post the DDL of those 2 tables with their PK and FK constraints.
Database pedant
Monday, November 29, 2004
 
 
Sorry I didn't mean to imply there were any Foreign key constraints in place here.  I meant foreign key to simply mean foreign key:

"A foreign key in a database table is a key from another table that refers to (or targets) a specific key, usually the  primary key, in the table being used. A primary key can be targeted by multiple foreign keys from other tables. But a primary key does not necessarily have to be the target of any foreign keys."
Almost Anonymous Send private email
Monday, November 29, 2004
 
 
"A foreign key in a database table is a key from another table that refers to (or targets) a specific key, usually the  primary key, in the table being used. A primary key can be targeted by multiple foreign keys from other tables. But a primary key does not necessarily have to be the target of any foreign keys."

That's a very nice quote.

1. No, the author is correct, a primary key doesn't have to be the 'target' of any foreign keys.

2. What about your foreign keys? The author of this doesn't say whether it HAS to refer to a specific key. But she doesn't say that it doesn't HAVE to either. What do you think?

Yet again, I say this has got bugger all to do with whether you 'allow' nulls in that foreign key field or use your own 'magic number' of 0. What is the meaning of 0 in authorID? That it hasn't got an author? Or you don't know who the author is? Which is it? You can't tell can you? Let's think about this direct example you've posited. No author? An article with no author? Just think about it? Could happen I suppose. Random text generated by a computer program might be described as not having an author. Or perhaps multiple authors, so not one single author so can't relate it to the other table? Ah. I see. It's a design problem.

(And by the way, which record in the 'main' table has a Primary Key of 0?)

Of course there is a beautiful way round this. It's called relation for relationship. You know how a many to many join (at the ER level) can't be implemented in the relational model, so you use an intersection entity? Right, do that for EVERY relationship. You NEVER post the primary key of one table into another table to act as a foreign key. You ALWAYS use a table between them, even for one to many. And if there isn't a record with the two primary keys of each table then there isn't a relationship between those two records.
Database pedant
Monday, November 29, 2004
 
 
"And if there isn't a record with the two primary keys of each table then there isn't a relationship between those two records."

Using my example again.  If there isn't a record in that relationship table then what does that mean?  That it hasn't got an author? Or you don't know who the author is?  You've solved nothing.

That table allows for that many-to-many relationship -- but I don't want that.  I want only a single author for the article or none at all.  You've added unnecessary complexity to the problem for no gain in clarity.
Almost Anonymous Send private email
Monday, November 29, 2004
 
 
""And if there isn't a record with the two primary keys of each table then there isn't a relationship between those two records."

Using my example again. If there isn't a record in that relationship table then what does that mean?  That it hasn't got an author? Or you don't know who the author is?  You've solved nothing."

Well, the topic was about Null versus Empty values. I've solved that. Relation for relationship gets rid of the 'problem' of nulls in foreign keys. I agree, the problem of absent versus unknown isn't solved yet. That lot over at the ThirdManifesto are having what looks like good go.

"That table allows for that many-to-many relationship -- but I don't want that."

No it doesn't. If you want a one to many then you put a unique index on the attribute from the 'one' side. You have a one to one join on that side. It's standard.

"I want only a single author for the article or none at all."

I'd be interested to know what it is you're modelling. Looks like a web based thing perhaps some sort of bloggy/newsy thing? Tell me, what is an article without an author? Where did it come from?

"You've added unnecessary complexity to the problem for no gain in clarity."

Clarity and complexity are in the eye of the beholder. 'Programmers' who are quite prepared to hack away at thousands of lines of code and will spend days shaving a few clock cycles off a loop or reducing memory requirements by 2 bytes often seem unwilling to put much effort into getting the data model right.
Database pedant
Monday, November 29, 2004
 
 
"If you want a one to many then you put a unique index on the attribute from the 'one' side. You have a one to one join on that side. It's standard."

Sorry.  That's a good point.  I do use this sort of relationship concept to avoid nulls and I do use a unique index to lock it down.  I would never use this for a single relationship (where just an AuthorID field would do) so I didn't think of that particular usage.

"Looks like a web based thing perhaps some sort of bloggy/newsy thing?"

www.coffeegeek.com

"Tell me, what is an article without an author? Where did it come from?"

An article has two fields, AuthorID and AuthorName.  The AuthorID can be any user in our system (either an author or a site member).  If that's set, the author is shown as a hyperlink to that user profile and the user can edit their own articles.  However, we also have guest articles (or reprints) -- these have authors but sometimes they are not users in the system.  AuthorName can be used to give an article an author in that case (which is not hyperlinked to anything, etc).  One last requirement is that you can specify both an AuthorID and an AuthorName -- in which case, AuthorName is an override of their name from their profile.  And, although I'm not sure it's ever come up, you can leave both fields "blank" and then the article has no author at all. 

"Clarity and complexity are in the eye of the beholder."

True.  But I consider that use of an extra table to be added complexity for no gain in clarity (in this case).  I model all zero-or-one relationships simple as a field.  What's the benefit of your way?

"requirements by 2 bytes often seem unwilling to put much effort into getting the data model right."

I actually think the best way to work is start from the data model.  Once you have a good data model, the code basically just flows from it.
Almost Anonymous Send private email
Monday, November 29, 2004
 
 
I still don't understand why empty string values exist alongside NULL.

I think the DBMS should enforce empty string values being set to NULL.

It sounds like Oracle may do that by not allowing empty strings.

Using both NULLS and empty strings in a varchar column just seems ambiguous and unduly confusing.
I am Jack's null neglect
Tuesday, November 30, 2004
 
 
Jack, I entiretly agree that you shouldn't have a varchar column that uses *both* an empty string *and* NULL to indicate 'no value'.

However, I don't think it is up to the *tool* (Oracle) to enforce a "one-right-way" of using NULL.  We've been discussing what are good and bad side-effects of using NULL versus other approaches.
AllanL5
Tuesday, November 30, 2004
 
 
"I still don't understand why empty string values exist alongside NULL."

An empty string is a valid value that you can operate on where as NULL is not. 

LENGTH(NULL) = NULL
LENGTH("") = 0

Empty strings are also a valid programming language construct.  It doesn't make sense that strlen("") = 0 in your programming language but LENGTH("") = NULL in your database.  This basically means you have some data (an empty string) that you can't store.

"Using both NULLS and empty strings in a varchar column just seems ambiguous and unduly confusing."

I've never actually come across this.  Every varchar column I've seen is defined "NOT NULL".  But in any case, we've proven that NULL is always ambiguous.  Anyway, an empty string means that no value was entered and NULL could mean any number of things (just as it would on a numeric column).

If you're going to throw consistency out the window than I submit it's a far better idea to convert NULLs to empty string than the other way around.  :)
Almost Anonymous Send private email
Tuesday, November 30, 2004
 
 
++LENGTH("") = NULL

This would = 0 in the databases I've dealt with. That is even more odd that any length function should return a null value.

I understand the difference between NULL and empty strings, I just think only NULL needs to exist.

There is no value provided by an empty string.

You cannot standardize on empty strings instead of NULLS for the reason of consistency that you say I am throwing out the window. (Which I really don't get, considering my point is the exact opposite.)

You can't put empty strings in a double, for instance. They should only be NULL. There should be no empty strings. Ever.

Of course, this is only my take on it. I just see no logic behind the coexistance of the two. This is why I think the DBMS should enforce it as a matter of integrity.

Can someone cite an example as to why you'd ever want to put an empty string rather than a NULL in a varchar column?
I am Jack's NULL naivety
Wednesday, December 01, 2004
 
 
"Can someone cite an example as to why you'd ever want to put an empty string rather than a NULL in a varchar column?"

So you have to convert all your empty strings to NULLs before storing them and then convert NULLs back to empty strings when you query.  This extra work gets you nothing.  And since operations on NULL always return NULL you can do do no meaningful work with them.  However, there are lots of meaningful work you might want to do with empty strings.

In a large number of programming languages, you can use both NULL values and empty strings (VisualBasic and PHP come to mind).  In C, you have strings as NULL pointers or as a pointer to string containing '\0' (an empty string).  For a database not to support empty strings makes it inconsistent with all programming languages.

So empty is a valid value for a string just as zero is a valid value for a number.  Would you also prefer that zero was represented as NULL?  After all it to means nothing.  ;)
Almost Anonymous Send private email
Wednesday, December 01, 2004
 
 
The empty string in SQL is a string, and while some applications might use this as an equivalent of a null (missing) value, other apps might use this as a non-missing valid value.

So, a null value in a database is a value that indicates missing data, i.e., data that is unknown, not applicable, or not representable. But, here is an example where an empty string is not a null value:

Many (snail) mailing list databases use a strucutre like:

address line 1
address line 2
city
state
country
zip

The address lines are spec'd to correspond with the lines printed on a mailing label (i.e., rather than having one longer address field, it's broken into two shorter lines to match address label formatting requirements).

So, a row with address line 2 empty (the empty string) is not a null value, it rather means: line 2 is blank.
Jay Fienberg Send private email
Wednesday, December 01, 2004
 
 
>And by the way, which record in the 'main' table has a Primary Key of 0?)

As the person pointed out,  this is a simple case of selecting a value for a look up. I use the word lookup, since really, often these types of values for a foreign key are not really enforced relation (no cascade deletes, or updates need occur).

So, we have a simple field, and want to record their favorite color.


tblCustomer

FirstName    Albert
FavorateColor:    0, or a value from the table below:


tblColors

identify      color
1    red
2    blue
3    green
4    Yellow

As a few have mentioned, the identify key field will never be zero, so they use a value of 0 in the Favorite Color field for tblCusomters.

There is two problems with this:

    First, MOST database systems will in fact index the value of 0. So, if you have 100,000 names…but only have a color entry for 25,000…or less names..the rest all have a value of zero..and are RALLY killing the indexing routines…as there is now 75,000 reocrds all indexed in the same key node with a value of zero. If you had used a null value here, then MOST database systems will NOT index the value of null Thus null is better choice from a performance point of view. In addition, one could make some lame argument about saving disk space also..but lets not even go there!


The second issue here concerns relational joins. Since the persons favorite color is optional, then we will need to use a left join in this case to display the persons favorite color (and, of course those that don’t have a favatore color also need to display in the report). If you have a value of 0, then a lookup during the relational join will be attempted will be made to the tblColors for the value of 0. If you have a null value, no such lookup attempt occurs….and you again get better performance…

Of couse..if you frequently need to search for those empty fields…hum..you might want in fact to use a zero value!!

While most discussion about a zero length string field vs a null field are quite lame (this is NOT a big deal at all). For zero length and null fields, you as a developer you just choose one standard and stick to it. This “choice” will often depend on the tools you have at hand. So, the only REAL rule here is to stick to some stranded, and :

a)    assume that all blank fields are null
b)    assume all blanks fields are zero length

I see no reason for allowing both ZLS and nulls in the same applicaiton.

However, from a indexing point of view, and a sql “join” point of view, using nulls does help. You save the database engine from indexing zero values, and you don’t make the database engine attempted to join to data in a related table for values that do not exist.


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn
Albert D. Kallal Send private email
Friday, December 03, 2004
 
 
While programming, I use the distinction between null and empty string all the time. (This is in .net)

If you were returning a string from a function, you want to be able to return both 'I have no answer for you (null)' and 'I have an answer and it is blank (empty string)'. The address example above illustrates this nicely.

Another example might be;

A user profile contains medical information. It's an optional field, and maybe the user has entered it and maybe they haven't. They're trying to book a outdoors course, which requires them to confirm that their medical condition won't be a problem.

So you can write code like this;

// medicalInformation can contain null, empty string, or any string.

if (null == medicalInformation)
{
    // user has not entered information
    PromptUserForMedicalInformation();
}
else if ('' != medicalInformation)
{
    // user has a medical condition
    PromptUserConfirm('This course involves strenuous activity. Is this OK for you?');
}

Being unable to differentiate would make this impossible.
Steve Cooper Send private email
Tuesday, December 07, 2004
 
 
This implies at the time the record was created the medical information wasn't required but at some later stage it was.

If the medical information is transactional, it was created because whoever entered some data wanted to enter medical information then the actual information would be mandatory and non-empty.

On the other hand, the assumption that empty medical information entered means no information is known might be a bad one.

(I grant you that you're creating a simple example to show a technique).
Simon Lucy Send private email
Tuesday, December 07, 2004
 
 
> On the other hand, the assumption that empty medical information entered means no information is known might be a bad one.

It'd be fine, I think, if that's what I'd _specified_ the data to mean.

But yeah, I'm trying to show how the difference between null and empty is important to programmers, and thus to database field design.

To me, null and empty string are as different as NaN and the number zero. Not in all cases, and in a lot of cases, I _do_ want an empty string rather than a null. But, hell, I can do that myself; give me a string which may be null and I can calculate

  string = string + ''

which converts nulls to empty strings and leaves everything else alone.
Steve Cooper Send private email
Wednesday, December 08, 2004
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz