The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

2nd normal vs 3rd normal form

In most database schemas I've seen in real life, the author/book relationship would be modelled with two tables:
* authors(author_id,name)
* books(book_id,title,author_id)

However that's still 2nd normal form I think, because of the author_id field in the books table.  So what you'd need for 3rd normal form is:
* authors(author_id,name)
* books(book_id,title)
* authors_books(author_id,book_id)

Have I got that right? 

How would everyone here model that relationship?  Is the second version necessarily better than the first?  Why?
forms Send private email
Sunday, June 04, 2006
The first example does not violate the 3rd normal form.  It would violate it if there was an author_name (on the books table) which was dependent on a non-key (the author_id). 

The design is fine, assuming a book has only one author.
Mike Send private email
Sunday, June 04, 2006
Strictly, speaking you can say that the 3rd normal form is better but there is a trade-off between the design and its implementation. Sometimes, going to the third normal form means more code is necessary to accomplish the same task; or there is more database access required. If you are talking of a few thousands of records it may not matter but if you are looking at millions of records, the delay in accessing the appropriate record is more noticeable.

I was involved once in a project that insisted on having tables designed up to the nth normal form and while everything looked good in paper, it was a nightmare to implement especially from the point of view of maintenance and support of the system.
Phillip Flores Send private email
Sunday, June 04, 2006
>> Have I got that right?

Not exactly. You seem to be confusing relationship types (one-to-one, one-to-many, many-to-many) with normal forms.

As a previous poster noted, both your schemas are 3NF. They differ in whether the author-to-book relationship is one-to-many or many-to-many. As for which of those is "better", it depends on the real world.

If the books you are dealing with only have one author, then the first is better. If some books have multiple authors then the second is better.

If you want to model all books published in America, then use #2. If you want to model the 'books' written by students in a third-grade class, then use #1. It all depends on what you are modeling.
PWills Send private email
Sunday, June 04, 2006
PWills has it right.
Monday, June 05, 2006
Since someone else mentioned the fact that a book can have more than one author, I'd go with the following. This is the same as what you have except that the intent is clearer. This design implies that a book can have more than one author. But finding all of the books that a specific author has been involved with is easy as well. Again, it is the same thing that you have except that I have reversed the definition of the last table.

* authors(author_id,name)
* books(book_id,title)
* books_authors(book_id, author_id)
Monday, June 05, 2006
"I was involved once in a project that insisted on having tables designed up to the nth normal form and while everything looked good in paper, it was a nightmare to implement especially from the point of view of maintenance and support of the system."

Database normalisation is like most (if not all) rules of development: you should always, always use it until it doesn't work, then drop it like it was on fire.

I always start from a normalised model and then adjust as required once it becomes clear where the performance issues are. This is absolutely the best method and I recommend that everyone disregard it and do things their own way ;)
Paul Brown Send private email
Tuesday, June 06, 2006
Normalization is a concept strictly related to transactional systems databases.
If you focus on datwarehouses, you'll see a very very poor level of normalization.
So, normalization is not a mantra at all.
Sevenoaks Send private email
Wednesday, June 07, 2006
If you focus on data warehouses you will tend to see a high level of denormalisation, that is true, but I would dispute that normalisation only applies to transactional systems; the forces at work in both cases are the same - the trade off between referential checking for multiple (i.e. denormalised) instances vs table joining.

In transactional systems the overhead of checking the integrity of the data, along with the extra locking required, usually outweighs the advantages of fewer joins, although not always and large, fully normalised databases are fairly rare beasts.

In a data warehouse situation there is less checking, partly because the data has already been validated when it was in the transactional system and partly because write operations to data warehouses are (should be) fewer than read operations by a large degree, hence fully normalised data warehouses almost never happen (in twenty years I've never seen even one).

This doesn't invalidate the principle of normalisation and, believe me, starting from fully normalised and denormalising a data structure as required is much easier and gives better results than trying to do it the other way around.

I am working on a transactional system at the moment where I suspect the original designer had never even heard of normalisation and the performance is appalling. I am now going to have to fully normalise it and then denormalise once I find the bottlenecks because at the moment the bottleneck is _every_ damn transaction.
Paul Brown Send private email
Monday, June 12, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz