The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Versioned objects in DB

Hi all.  We have a client that needs us to build them a dataabse application that can handle related versioned objects.  The app is for stormwater runoff compliance. 

We need to be able to say that Factory X used material Y dumping into drain Z until September of this year.  On Oct 1, that drain changed characteristics, so the environmental impact changed.  Now, tell us what they were supposed to do about stormwater runoff on August 28, Sep 10, and Oct 6.

Each of the 25-30 related objects can change at any time and we need to be able to log changes and reconstruct the state of the whole mess at any given time.  So just adding a date stamp to each object doesn't help much unless we can link it to appropriately datestamped items. 

One approach would be to build dated views on the fly to get a snapshot of the db on a given date, but seriously, 25 dynamic views created every time we ask a date sensitive question...I'm hoping someone has a better idea.

So, any suggestions?  The back end is likely to be MS SQL although PostGRES is in the running. 

Thanks a bunch folks!
Brian Send private email
Wednesday, April 12, 2006
Not trivial.

There's a few books out there on temporal database design.

Wednesday, April 12, 2006
Of which this is my favorite:

You can buy a dead tree version if you want. His running example is to do with tracking the movement of cattle, for compliance purposes - sounds pretty similar to what you need to do.
Larry Lard Send private email
Wednesday, April 12, 2006
I would personally use an auditing approach. Each "change" is an audit transaction that gets logged to a spearate table. It's much like handling inventory. The current value ends up being the sum of all changes. You can get the current iventory or the state of the inventory on any given date simply by adjusting your where clause to include/exclude certain changes.

However, it sounds like you have multiple properties that can be changed. This will make it much more complex no matter how you slice it. But the auditing approach will probably still work. You just need to use key-value pairs in your audit table to show which property has changed.

As a side note, you should also provide a mechanism to "roll up" all of the past changes by consolidating audit rows into a single row. This helps with performance when the data is very dynamic.

That's just how I'd approach it. You may find other approaches that work much better. It really depends on how many properties of the object are changing and how often they change.
Wednesday, April 12, 2006
I think that the priority here is to make sure that the data model is a correct and complete representation of the real world -- this may seem obvious, but if you take shortcuts in the model in order to gain performance then you'll regret it later.

Now once you have the model correct, then you can look at opportunities for implementing features to enhance performance. what features you have available will be platform dependent of course.
David Aldridge Send private email
Wednesday, April 12, 2006
Thanks for the pointers. 

Part of my problem was I didn't know that "temporal" is the adjetive I needed.  I kept getting search results on how to version a database schema, which just isn't the problem I have.
Brian Send private email
Wednesday, April 12, 2006
Larry Lard,

That's a very useful resource, thanks!
JD Send private email
Wednesday, April 12, 2006
Yeah ... +10 on the Snodgrass book. I built a large bi-temporal database for some insurance folks waayyy back in 1999/2000. If I didn't have the core concepts from that book, I would have been dead in the water. Employer Groups change benefits over time, plans change over time, employees move from employer to employer over time, rate schedules and fee schedules change over time, doctors migrate from practice to practice over time, practices pick up and close down various locations over time, regulatory/laws/compliance/legislation governing claims and claims procesing change over time -- you get the picture. Multiply that by tens of thousands of providers, hundreds of thousands of employer groups and tens of millions of employee members and you've got one giant mess on your hands.

There's heavy regulation, compliance, and auditing requirements in the insurance world. Keeping good records of the above (and more) changes over time is a necessity for this domain. Bi-temporal was the way to go, and the Snodgrass book (with some minor modifications to his ideas) saved my arse big-time.

I have no idea how any system in the insurance world would even work without the bi-temporal concepts as outlined in the Snodgrass book. It was a lifesaver for me. I've not seen anything better.

I am, however, disappointed that it's now available for free ... *After* we bought everyone in the office a copy of the book before it was free. Bummer.
Wednesday, April 12, 2006
I've been through the situation myself a year or so back. What we have decided to do is design the database in such a way as to give a current picture of the data. We recorded the changes, similar to the audit approach described above, as they are made to the db, storing the changes separately.

We then periodically take a "snapshot", of the data and archive it with all the changes, up to that snapshot (and since the previous snapshot).

If we ever wanted to get a state at a certain time, we get first the snapshot before the time and apply the changes up to the time. (This process requires an app.)

From our perpective, the system and its users were generally only concerned with the current data state, 99% of the time. It's just that if we needed to get the state at a certain point, it could be done. This had also the benefit of a simpler and better performing db.

Not to dissimilar to how source control systems work..
Wednesday, April 12, 2006
> I am, however, disappointed that it's now available
> for free

It's over 500 pages! I think the book form is more useful. Aren't there cheap automatic binders yet?

Thanks for the reference BTW, very nice.
son of parnas
Thursday, April 13, 2006
Thanks for the link to the book!

Another article I found very useful was Martin Fowler's
Patterns for things that change with time.

He discussed a number of approaches and the pros and cons and is a little lighter than 500 pages for someone who wants to get up to speed quickly.

Hope this helps
Tim Hastings Send private email
Thursday, April 13, 2006

Here's an approach which I've used.  It has its limitations, but also has the benefit of being fairly simple:

1. Start with an AuditLog pattern (follow the Martin Fowler link for details), but...
2. Don't have a single system-wide audit log - have one for each database table.  Each table has an "audit counterpart", which includes all of the original fields, plus some extras for auditing purposes.
3. Every time a record is changed, copy its new state into the corresponding audit table.
3. Re-create prior states (of the whole database) by pointing the data access layer of your system at the audit tables (instead of the "real" tables).
4. Once you've loaded a prior state, just handle it all as normal in the rest of the app - except you must treat it as readonly.

The tricky bit is in step 3: how do you ensure the system loads the right version of each record?  For every record, the audit table contains every version that ever existed; but you need exactly one of them - the one that was current at a particular point in the past.  There are several possible solutions.  I'm using one at the moment which is nice and simple, but isn't guaranteed to scale to really large databases.  Other solutions may be more scalable but also more complex.
John Rusk
Saturday, April 22, 2006
Following is a simple pattern I'm using with NHibernate:

Required columns:
1.PersistenceID, identifies a row. (primary key)
2.EntityID, identifies an object.
3.EntityVersion, together with EntityID to form an unique key.

When a new object is saved, its EntityID is set to be equal to its PersistenceID, and has EntityVersion = 0.

When you update an existing object: (object A)
1.Retrieve the object A~ *before* current modification
2.Give A~ a new PersistenceID (because it will be INSERTED)
3.Set A~.EntityID = A.PersistenceID
4.Save A~ (an INSERT)
5.Increate A.EntityVersion by 1
6.Save A (an UPDATE)

By this method, the primary key of any rows in the table never changes, and you dont need to worry about foreign reference. It's simple to implement and works pretty well for me so far.
AqD Send private email
Monday, May 01, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz