The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database design for historical data

I'm having some trouble deciding on how to store historical data in the database I'm currently working on.  The purpose of my application is to calculate periodic commission-based bonuses for salespeople, and these bonus amounts are dependent on each salesperson's position at the time of each sale.  A salesperson's job position can change throughout time.  The two general  approaches I see to this are:

1) Store the position history in a table (job position and its effective date, basically).  Use a view (in SQL Server) to join each sale with the correct job position when calculating a bonus.  The join would be based on the date of the sale and the effective dates in the job history table. 

2)  Store the position with each salesperson and simply add it to the sale record when the sale record is entered into the database.

There are some hybrid approaches, too, I suppose, but I think these are the 2 basic directions I can take.  The 1st option is nice because changing the job position start date in only one record (in the position history table) changes all the positions in the sale records, too (this could be a weakness as well, though).  I'm worried also about the performance of joining by date range and if this is putting too much logic into the database view.  Another problem is that you can't recalculate a past bonus or produce historical reports if a job history record has changed.  The second approach is simple and would perform well, but I'm worried that one incorrectly entered job position would cause errors to begin propagating throughout the system and be too difficult to clean up.

Thanks in advance for any help!
Stuck in a vacuum Send private email
Wednesday, March 28, 2007
 
 
This book is out of print, but the author and publisher have kindly made it available as a pdf:
http://www.cs.arizona.edu/people/rts/tdbbook.pdf
Peter
Wednesday, March 28, 2007
 
 
A quick answer, option #2 is ultimately better in the long run (or similar approaches). You would just have to be doubly sure that your code works as intended, and correctly deals with any mistakes, abnormal data, timing errors, etc etc.
TheDavid
Wednesday, March 28, 2007
 
 
What do you need to optimize for? I'm loathe to recommend "good" and "bad" without the context.
Steve Hirsch Send private email
Wednesday, March 28, 2007
 
 
Our app calculates commissions for stock brokers, which are  based on an insanely complex set of rules (often tiered by what they sold and the size of the sale). We went for option #2, because we felt the system was complicated enough without an option #1 approach. If someone screws up, as happens every now and then, a human has to make adjustments, and we provide a utility for this purpose. (It basically lets them add fake trade records that only affect commission and from then on the rest of the app works normally). Nobody really cares about this limitation, so if I were doing it all over again, I'd most likely do the same thing.
Greg Send private email
Wednesday, March 28, 2007
 
 
What you need is a Valid Time database.  The book Peter linked to above explains all.
sensible Send private email
Wednesday, March 28, 2007
 
 
+1 for the book that Peter linked. I got this book a few years ago and it has proved invaluable, as it explains everything properly, with loads of example code.

Since reading it, I now think about databases in bitemporal form as the default, rather than thinking of it as an afterthought.

Get it used from Amazon, etc if possible.
Entries of Confusion Send private email
Thursday, April 12, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz