The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Fixed db schema versus flexability

Market forces have led us to upgrade from one DBMS to a more modern DBMS.  There are the makings of a religious debate brewing at our shop.

1) Our DBA is stressing the importance of having a fixed schema (not that it would *never* change, but that schema changes would go hand in hand with new releases of the client app). 

2) Legacy features, though, suggest the need to have users create an occasional new column on existing tables.  In our particular industry, clients are basically tracking the same sort of information, but every single client needs a small number of fields that no other client needs.  What our clients are doing is just not standardized enough  for us to come up with an exhaustive set of fields that is also not intimidating.  Also, some wishful-thinking features suggest allowing users to create whole new tables in the future (albeit with specific relationships to existing tables).

So on the one hand, there are valid DBA fears regarding security and the ability to tune queries.  On the other hand, we either need to allow schema changes OR simulate them in some way.

Has anyone else out there had to implement user-defined columns for a mostly-fixed schema, in the presence of a DBA with valid security/tuning concerns?  What are some of the alternatives?
Okey Dokey
Monday, October 03, 2005
We have a big system that keeps track of customer data. It started off fixed (fields for SSN, birth date, etc.) but a couple years ago we added the concept of user-defined fields where you could add things like Name of Pet, Favorite Ice Cream Flavour, etc. Basically we have a table that has three columns in it, customer#, name of field, and field value.

This sounds nifty and everything but the bottom line is I am really not at all sure it was a good idea. The biggest problem is that every developer who comes across it gets confused and stressed out over it because it's so different from everything else. This is in part because MOST of the data remains in the 'regular' fixed schema database, but the odd thing you have to go hunting around in the 'new' table for.

If I were doing it all over from scratch, I'd probably go one way or the other, not part fixed and part flexible.

Some other issues we had with this:

1. Depending how you do it, you may have type-safety issues, eg. in the simple schema I described above the 'data value' is always character, so dates and so on have to be cast back and forth all over the place.

2. How user-defined do you want to make it: eg. do you want the flexibility at the level of the individual user, the customer, or the application? Of course users want to be able to do their own thing, but this might cause problems if, for example, you sell a tax reporting system and someone decides to omit capturing birth dates.

This is an interesting subject, that's all I can think of for now, HTH.
NetFreak Send private email
Monday, October 03, 2005
Our systemn has two ways of handling things like this in it. 

One - we allow the users up to 20 user data 'keys' in the app.  Each of these 'keys' has dedicated tables, metadata etc.  Our predefined SQL & SQL generators are tuned for these 'keys'.  Processing is allowed to use these keys as transaction identifiers.  These 'keys' can be active or not.

Two - the users can define any number additional data elements.  We store them all as strings with some metadata for conversion instructions.  These elements are *not* allowed as identifiers in transactions.  There are two tables involved-one defines the name, datatype etc, the other contains the values.

Either One or Two may be lists or groups and may be used in selection listboxes.  Each has dedicated app functions to allow the users to maintain the data values. 

Since One are 'keys' and we control the heavy-lifting SQL, optimization & performance are in our control. 

Two are 'attributes' and can be reported on, but are pretty much immaterial for performance.

As NetFreak says, new developers coming across this for the first time can get confused/nervous/excitable. (I sure did!)  Once you get used to it it's really no big deal.

There are the type-safety issues to deal with as NetFreak also said.  We have a generic set of validation routines for the data types we allow; if the data doesn't conform, meaning the runtime won't have rules to handle it, it doesn't get saved.  The users have limited abilities to define their own type, for example a 'ProdID' as varchar(30).  They can then identify an elemnt type as a 'ProdID'.

Debugging is a bit more involved b/c not only do you need to see what element is involved, you need to know what the runtime type/value *is* and *is allowed to be*.  Finding 'igneous' in a key element (One, from above) that should contain 'Plant' values is a different problem from not recognizing a 'pinus jeffreyi' 'Plant' as a 'Tree'.
a former big-fiver Send private email
Monday, October 03, 2005
I like to just build new tables that link back to the original. Makes like easier for updates.. version 2 gets the following tables, version 3 these tables, etc.


not exactly pretty, but it allows your code to be somewhat backward compatible... makes updates MUCH easier to make.

Tuesday, October 04, 2005
>> I like to just build new tables that link back to the original. Makes like easier for updates.. version 2 gets the following tables, version 3 these tables, etc.

Yeah, small steps are easier in and of themselves.  The thing we've run into as we try to expand our sales into different markets is that the businesses all have their own terminology, and something like dead-tree publishing has different ancillary data than movie production or life sciences.  None of them want to cart around data fields that are irrelevant to their business or to look at labels that are meaningless in their jargon.

The reason we chose to eat the overhead of the dynamic approach is that we have one code base to support regardless of client industry.  No extra configuration files, no industry-specific resources etc.  The program is somewhat larger and harder to maintain (why we get the semi-big bucks, right?) but no one customer has special modifications.  We also don't have, as I've seen earlier in my career, the every-other-release mass refactoring to fit a once-specialized mod into the general application architecture.

If you're doing something that has limited reach or a short anticipated lifespan, the dynamic data approach may not be worth the price.
a former big-fiver Send private email
Tuesday, October 04, 2005
imho, the db should define the schema, it should not be defined by the application (if possible).

Changes in the db should not impact the application, particularly if you use a layered approach.
Arafangion Send private email
Friday, October 14, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz