The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database Table design dilemma

I need to create a  table (to store information about a product) that can hold a varying number of fields. The number of fields can change/shrink many times over a certain time period. The fields (in our case product features) are added or dropped through a web UI by the product designer.

There are 2 approaches I see. Please let me know if there is another way to look at it. And also which one is more efficient.

Approach 1:(Assuming that there are only 2 data types)
create a table with the following fields:

product_id,field_name,field_value,field_dataType1,field_dataType2

Approach 2:
create a table and then dynamically add columns as need be
product_id, field_name1,field_name2,...

------------------------------------------------
There are 6 basic features(fields) for each product and then additional one may be added/dropped on a later date.
Urumi Send private email
Tuesday, November 28, 2006
 
 
Ugh.

What information are you storing here?
David Aldridge Send private email
Tuesday, November 28, 2006
 
 
If I understand correctly, approach 3:

(productId, productName, otherProductStuff)

(featureId, featureName, otherFeatureStuff)

(productId, featureId, featureValue)
Mike S Send private email
Tuesday, November 28, 2006
 
 
Mike S++

Approaches 1 & 2 are silly, as your features are records, not fields.
Posted by me
Tuesday, November 28, 2006
 
 
Does featureName also need to be stored in the database, or just its value? If the web UI itself knows what all the possible features are, it will save having to lookup the featureName and otherFeatureStuff.
dev1
Tuesday, November 28, 2006
 
 
I never really liked the "allow the user to define the number of fields" feature because I've never seen it implemented well on the web page. Either you get a long list of empty cells that you leave blank when you don't want to use them, or you have to keep constantly hitting the add button.

Having said that, creating a new, empty table with a lot of columns is still a lot cheaper than dynamically adding columns as you go. However, I'd make a couple of changes. Make the first column a generic primary key. The second column is simply attribute1, the third becomes attribute2, the fourth becomes attribute3 and so on ad nausaeum. All attribute columns contain strings or varchar data types.

In the application itself, define a "lookup table" for each product that maps the web page form field to an attribute column, as well as some code to convert the data into strings and back into the desired format. (Don't create the lookup table within the database itself since you want to treat it as a cache.)

Example: product 1 has height (attribute 1), weight (attribute 2), and color (attribute 3).

The end result is you basically have a gigantic spreadsheet with no header row, and a program that figures out which columns it should look at based upon the actual product itself.

Now having said all that, the real answer is that this question (and students ask me this question a lot) demonstrates why you actually need to define the data instead of trying to build a generic hold-everything table. It can be done. It's just messy, slow, and defeats the purpose of a relational database. :)
TheDavid
Tuesday, November 28, 2006
 
 
Mike S,
Thanks. What you suggested is exactly what I meant by approach one but was inarticulate in explaining it. But why should I pick it over approach two?
Urumi Send private email
Tuesday, November 28, 2006
 
 
Split it in at least two tables. One would contain the constant parts of the record (record ID, name, whatever), the other(s) keyed to its primary would contain variable data. This way, even if fields change, you'll be able to use the old records.
ping?
Wednesday, November 29, 2006
 
 
Approach 2 requires changes to the database schema and changes to application code when a new feature is required.  Approach 3 can handle new features with no database or code changes and is therefore more flexible.
Mike S. Send private email
Wednesday, November 29, 2006
 
 
re: Mike S's 3rd choice... not to mention being normalized, this is the standard idiom for solving the problem.

-C
christopher Send private email
Thursday, November 30, 2006
 
 
What domain will featurevalue be based on?
Database purist
Saturday, December 02, 2006
 
 
OK, no answer.

The point is this. Mike S's solution seems OK. At first.

But I'm willing to bet a large amount of money that very soon you'll realise that featureValue needs to contain integers, fp numbers of some sort, datesa, text values and so on.

At that point you'll have a lot of type casting to do.

Soon after that you'll realise that some of the values (like the text values for colors) you will want to restrict to a set of values, defined by you. So now that field contains a FK to a look up table.

This is a version of the Entity-Attribute-Value idea. And as such is not good.

You need to look a lot harder at you actual data and needs.
Database purist
Monday, December 04, 2006
 
 
The design that Mike S recommends enables these other excellent practises:

Data in Tables -- Not Code
http://weblogs.sqlteam.com/jeffs/archive/2006/02/10/9002.aspx
Clayton Scott Send private email
Wednesday, December 06, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz