The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

DB Design With Large Number of Fields

In the health care industry, the "patient" can have an extremely large number of 1 to 1 attributes (some systems I have seen had around 90.)  Over the years, I have seen database designs handle these in a couple of different ways.

A. Create a single Patient table with a very large number of fields.

B. Break the fields into a groups of related fields and put into their own table (for example, social attributes in a table called "PatientSocialInfo" with a key back to the Patient table) even though the "Patient Social Info" is 1 to 1 with the patient.

Neither method seems exceptionally "gracefull", but I have seen both methods work (assuming row in method A doesn't exceed max row size.) What are the recommended strategies for this situation?  I am sure there are standard design practices for this, I just have never ever come across them.
Aaron Send private email
Saturday, February 24, 2007
It depends on how they're used...

If you have a group of fields that are only rarely used but all used together, I'd likely go the sub-table route.  Then you could use a Lazy Loader to only grab the fields once you needed them.

If you have a handful of core fields that are used constantly and all the rest only rarely, then you could keep those on the main table and then demote all the attributes to a sub-table and do another Lazy Loader.  In this scenario, you need to make sure that the fields that should only have one value actually only have one value.

There are probably 8 other strategies...
KC Send private email
Saturday, February 24, 2007
You could design a patient table, with a patient ID and a list of the most common or always-used fields.

Then define an 'extended attribute' table with an attribute ID and description (for e.g. rendering forms or what have you).

The define an attribute-to-patient table that binds a patient to an attribute and also assigns that attribute a value.

Your display or domain object code can then be written to understand that if a patient has no binding for a specific attribute in the binding table, the patient has no value for that attribute.

One downside here is the attribute values are harder to validate because they're some generic type such as a string.  You can get get fancier by assigning types to attributes, having different typed attribute mapping tables, or add constraint logic that checks attribute type to a value, if necessary.
Dan Fleet Send private email
Saturday, February 24, 2007
If I understand the "extended attribute" scenario, reporting on the data would much more complex wouldn't it? It seems that you would always have to transform the multiple rows of extended attributes that belong to a patient into a single row of properties before doing any sort of query. Is this a fair assessment or have I misunderstood?
Aaron Send private email
Saturday, February 24, 2007
We had this scenario in our Customer Master system. In the stocks & bonds business, you have to capture dozens and dozens of attributes about each customer. Not only that, individual customers capture different fields.

We basically created one table for the core stuff that everyone has (birthdates, SSN, name etc.) and then a dictionary-type table (key#, attribute name, attribute value) for all the 'optional' stuff.

I am still not sure if this was a good call or not. The drawbacks are that nobody really understands how the 'optional' table works, and people are forever wanting to slide stuff into the 'main' table. This is especially true when we have customers who access the data directly and just want to get the damn XXX field and don't understand why some fields are so simple and others are so complicated.

I think if I were doing it all over, I would opt on the side of KISS and just have one huge table for each person (more or less), and accept the hassles of adding new fields for different customers and having code to hide those fields from people who don't want them.
Greg Send private email
Saturday, February 24, 2007
Every option is painful.

You can use a big table model in the database and properties model in your code. That allows for efficient database handling with the ability to handle arbitrary attribute sets in your code.
son of parnas
Saturday, February 24, 2007
"This is especially true when we have customers who access the data directly and just want to get the damn XXX field and don't understand why some fields are so simple and others are so complicated."

Exactly what I was afraid of, but have never met anybody who actually used this design on a datacentric application. Good stuff..!

"Every option is painful."
I'm feeling much better about my years of confusion on this subject :)
Aaron Send private email
Saturday, February 24, 2007
+1 core table for attributes that (almost) everyone has and child tables for logical groupings of optional attributes

Nothing magical here, just basic design principle of breaking complexity apart into manageable chunks.
Mike S Send private email
Saturday, February 24, 2007
Christopher Wells Send private email
Saturday, February 24, 2007
"reporting on the data would much more complex wouldn't it?"

Depending on which attributes you want to report on and which ones are filters vs just displayed data, your reporting can become a bit more complex, yes.  However you can apply appropriate abstractions, either in the DB or in your middle tier code.  Alternatively you can separate reporting functionality from normal CRUD functionality, and do your reporting out of a reporting database or data warehouse, where you can flatten and denormalize data.

The advantage of the attribute model, is you can add arbitrary attributes without having to modify your table structure.  Depending on how you architect your forms and reporting, they can even automatically pick up new attribute types.  The disadvantage is that it is not trivial, although it is simple in concept.  It is trading some implementation complexity for flexibility.
Dan Fleet Send private email
Saturday, February 24, 2007
My own experience is that you are better off with a big table than chasing links all over the place.

You can use views to simplify.

If you use multiple tables, use views to join together groups.

So you would have Patient, PatientInsurance, and then a few called PatientWithInsurance so that the applications don't have to chase the link.

Depending on your architecture you could of course do this with libraries to access the database as well.

Just don't chase links all over in application code.
dot for this one
Saturday, February 24, 2007
A workable Oracle alternative to the vertical partitioning approach is to maintain a single wide table but shunt the commonly accessed columns off into a materialized view.

I f I was going to be using a multiple table approach then I'd definitely be thinking of a hash cluster for storage, although when you consider the disadvantages you might just as well use a single table anyway. In Oracle I'd be sure to place the most commonly accessed columns at the "left hand side" of the column listing to reduce the CPU overhead of finding the values.
David Aldridge Send private email
Sunday, February 25, 2007
I would favor group fields by function and use into separate tables.

a. Security administration can be more granular
b. Changes to apps are easier when isolated to one table
c. Concurrent record updates are easier to manage
d. More performant (some fields will be used more than others, allowing more of them to be cached when put in a separate table)

a. Report hassle requring multiple joins.
Monday, February 26, 2007
In general, I would have gone with the single table approach simply because it becomes easier to find and see which fields are actually used - a major timesaving advantage when "refactoring" data.

However, in health care, the first reason OneMist8k mentioned alone probably justifies the multi-table approach: security. This approach would give you a second padlock around sensitive data and would look better on a security audit if you had a nice simple rule like patient financial records can only be accessed from certain systems.
Monday, February 26, 2007
I had a similar problem in the printing industry and I solved it by dividing the fields into two types: the main fields and the optional fields. The main fields were first class columns and the optional fields were stored in XML blobs in that same table. This way I could have a very high number of items but a very low number of columns.

The main fields were those bound to be keys in most operations and reports whereas the optional would be additonal information that typically you wouldn't need to query. The optional are harder and slower to query but not nearly as hard as in the key/value approach.

That db is now some 10 Gb and seven years old and it performs quite well. Even on LIKE filters on the XML blobs. Not a very relational model though...
JSD Send private email
Thursday, March 01, 2007
In my software I have 3600+ fields not counting the 1000s of user definable fields.

For me this is not a problem because I use my own database engine that I wrote myself.  I use in memory compression and decompression.  Unused fields take exactly 0 bits.

And my database scales well, it can handle millions of transactions.
Donald Duck
Friday, March 16, 2007
why you use "my own dn" if  free db engines have all you need? and sqlserver Express 2k5  or free oracle  ?

Wednesday, March 21, 2007
Sub-tabling (one-one relations) has one more PRO: you can define "mandatory groups".
Suppose we have a patient and his insurance data. If patient has an insurance, several insurance attributes are mandotury (e.g. company, number, due date...), but if pation does not have one, none of these fields should be filled (so thay cannot be defined 'not-null'). In that case you create a sub-table 'insuranceInfo' with mandatory fields, but 'insuranceInfoId' field in 'patient' table is nullable.
Wawrzek Hyska Send private email
Thursday, March 22, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz