The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Schema design question

I have a schema where I will be pulling in data from a join of 2  tables. One of the tables in the join is always the same. I will be passed the primary key for that table. The table it is joining is determined by what "type" the record in the first table is.  Each type has it's own table, so I'll need to know the type before I can determine which table to add to the join. The problem is I will only be passed the ID and not the type.

What i wanted to do was to prepend "type" to the primary key.  Such as [typeID]_[id]. When I get the id I can determine right away which table i'll need to do the look up in.

What are the potential problems with utilizing a naming convention instead of adding a column?
Should I add type as a column to the first table..[select type_id from table1 where id = 1], and then do select * from table1 left outer  join table1 left outer join [typeid]_table.  The problem i was trying to avoid is to do another query before performing the join query.
chailatte Send private email
Friday, November 21, 2008
You only have two sensible choices.

1) Merge your "type" tables into one, so a normal join works well.

2) Do the query in two stages, probably in a stored procedure, so you only need one database call.

You could write some nasty SQL to merge all of the tables, but only select out the correct ones, based on the type. This would work fine for trivial number of rows, but may degrade quickly, depending on your data, indexing, RDBMS, etc, so not recommended.

The idea of prepending the type on the ID is not a good one, and is considered bad practice. Each column is meant to represent one thing and merging them is not a good idea in the long run, even if it helps today.

Without knowing more about your application, and the database you're using, etc, it is hard to give more specific advice. Just chose option 1 or 2 above and move on.
Scorpio Send private email
Sunday, November 23, 2008
Addendum to 1) - Create a view across all the Type tables, adding a column to give you the "Type".

SELECT 'CUSTYP' AS Type, Id Description
SELECT 'PRODTYP' AS Type, Id Description
SELECT 'ADDRESSTYP' AS Type, Id Description

Join from your other table on the "Type" column.
Justin Send private email
Monday, November 24, 2008
I like Justin's answer, from a logical point of view.  It will work across a wide variety of changes to the schema.

However, be prepared for the view to run slowly.  If there are only about a half a dozen tables in the union view, you may be able to live with the reduced speed.  If there are a hundred different table types, you need to use the two query method, and translate typeid into tablename yourself. 

Since your question was a "schema design" question, I have to ask the following question:  why was the schenma designed this way, instead of using the more usual "generalization specialiazation" desgin pattern described in some books on modeling and design?
Walter Mitty Send private email
Monday, November 24, 2008
we wanted to keep the core intact while allowing for extensions through types - with each type maintaining it's own set of data.  The data could grow to be very large, since it will be large blocks of user content, so to combine them all into one table seemed to pose an issue, especially since we foresee the addition of types as the application is used in the long run, and plugins for it being developed.
chailatte Send private email
Monday, November 24, 2008
It is a hideous "design", but if you must go for it, then you should go with the two-step query method.

The performance of the dynamic merge query will not be great when you start, and will probably degrade as you add more data and type tables.
Scorpio Send private email
Tuesday, November 25, 2008
Justin's solution won't help unfortunately because by assumption the 'Type' tables will have different columns.  You can't do a proper union then. If they do have the same set of columns, you should merge them into one big table.

This brings you to another problem - you should only select records with the same type on every call - else you'll have to join multiple tables and the resulting output will contain columns that have no proper meaning depending on the record type. (eg a record of type A will contain colums for both type A and type B - the presence of the B columns (even if they're NULL) is not very elegant).

So if in practice you are constrained to selecting only records from a certain type in one query. You might as well include the recordtype in the primary key, like you suggested. Any proper identification procedure will do. "type_number" will be fine since a number will not contain a "_". If type can contain a "_" then make sure you split on the last "_" character.

If you're going this route the next question is why you'd even bother with the main table. You might as well split the types out into completely seperate tables.

Splitting out the types into separate tables gives you another option to specify the type info. If you are able to limit the amount of possible Types, you can make the primary keys of the different tables non-overlapping by incrementing each primary key with MAX_TYPES and starting the identity increment of each table with a unique value for that table.

So if you have a maximum of 32 table types, then let type A start at record id 0 and increment the primary key for each new record with 32. the table for type B starts at 1 and increments with 32.

Now a module 32 of the primary key will give you the table type. Be aware that you'll run out of primary keys MAX_TIMES faster than in normal scenario's.

A variant of this approach would use Guids are primary keys and then use part of the guid to specify the table type - basically the same approach but with a gigantic number space - this off course will be detrimental to performance.
Mathieu Send private email
Wednesday, November 26, 2008

"The idea of prepending the type on the ID is not a good one, and is considered bad practice."

That is certainly true if you examine the issue with the regular normalization rules in mind. But don't forget those rules where created for relational data models. In a proper relational model each table must model only one entity type. In that case you don't need a type identifier
in the primary key to identify the subtype *because you've already identified the type by selecting from a specified table*.

The trouble with mapping Object Oriented models to a Relational database is that object inheritance doesn't fit very well with the relational model. You can't apply the normalization rules strictly if you're shoehorning an OO model into a RDBMS. (And if you did, you should advise to get rid of the main table since no proper entity maps to it).
Mathieu Send private email
Wednesday, November 26, 2008
Thanks all for the great input!
chailatte Send private email
Wednesday, November 26, 2008
"You can't apply the normalization rules strictly if you're shoehorning an OO model into a RDBMS."

That's true, but I think all of this is just a re-hash of the age-old impedance mismatch problem.

Considering the canonical example or the Products table, how do you cope with 1,000 products, all with different attributes in an RDBMS?

There is no good answer, but lots of sub-optimal ones:

* You can have 1,000 tables
* You can single table with hundreds of sparsely populated columns
* You can store product details in XML
* You can use some other custom text-based format
* You can use .NET objects in SQL Server 2005/2008
* You can use an key/value attributes table

All of these approaches would work, but they all have potentially severe consequences, especially when you get to non-trivial volumes of data.
Scorpio Send private email
Wednesday, November 26, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz