The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

database structure for 'loosely' tracking changes over time

Hi everybody!

This forum seems to me to be one of the most knowledgeable groups on software development, so if you guys can't help me on this, I must be doing research, not development...

Please bear with me, as I might take some time to elaborate on the nature of my problem.

I work at a large research-project that tracks language variations in a large number of people (couple of thousands) over a period of 30-50 years. During that period the project are only in contact with those people every 10-15 years, and they are under no obligation to inform us of changes of address, marital status etc.
However, those changes are rather important for the research we are doing, since locational and sociological changes may have an impact on the language variations we are looking at.

My job is to design a database that can store the information we do get, about the people, their locational and social status, and changes in those during the next 30-50 years. How do I design a databasestructure to not only keep a list of changes, but also allow for rather 'loose' defined changes - eg a person may report during an interview that s/he married a couple of years back. It seems to me that a table of statusses and dates or even years, would be a bit too rigid, since I can't expect the data to be very precise.

And then there is searching - somehow I have to allow for searches for "women who in their twenties moved from rural to city and married after finishing college"... But in my infinite optimism I hope that the searches will avail themselves when I have a nice structure for my data...

I am not restricted to use plain relational databases, and I expect to have to develop a highly specialized user interface for the system, so even the wildest suggestions may prove valuable.

Also, recommendations for good books on database-design, something like "Code Complete" or "Programming Pearls" or "Design Patterns", but for databases, are very welcome!

Thanks, in advance
Peter L.
Peter L. Send private email
Tuesday, July 25, 2006
It seems you are building an analysis tool. Learn about Data Warehouses and OLAPs. I can guess you need to implement a DW with slowly changing dimensions of the second kind.
Dino Send private email
Tuesday, July 25, 2006
You're also not the first person to tackle a project like this. I vaguely remember someone asking for help on how to modify a database toolset developed by the University of Texas, that started with an S. I want to say SLOS or SLUS or something like that.

In any case, I do encourage you to see if there's something out there that fits your needs - in fact, I'd think any true University grade Genealogy software would do the trick.

In fact, I just now googled for "genealogy database schema" and saw a number of promising hits. Granted, you would have to add to it, but hopefully it's just a matter of adding more attributes as opposed mapping out tables and relationships.
Tuesday, July 25, 2006
You are saying that there could be alot of disparate, non predictable types of data, yet you are able to approximate quite a bit about what it could be.

There are 2 ways here, you can either try to 'dynamically' make different things to store (bad idea) or sit down and come up with a ton of different things that you are interested in keeping track of.

The dynamic thing is a mistake IMHO, as it doesn't really work well. YMMV.
D in PHX Send private email
Tuesday, July 25, 2006
Fire up Google and type the following search query:

Patterns for things that change with time

You'll get a few pointers to articles (and at least a complete book) about this.

Granted, most of the literature in this field expects you to have to deal with *specific* time events (example: you issue a bill to someone on March 10th of this year, then record that you discover on the 4th of June that the person died of heart attack on January 5th, so you have to juggle the concept that he has been dead for the last 6 months, but was "presumed  alive", in your system, until the notice) but I hope you may find something valuable in these.
Paolo Marino Send private email
Wednesday, July 26, 2006
Also, look at the literature for genelogical databases; admittedly the ones that make it to market are not great for this sort of thing, but they can at least record streams of "fuzzy" data.

There's probably some theory out there about how you record "tales told you by some nutter in a graveyard while looking for headstones".

My parents were told some stories about an apparent ancestor. In a graveyard while looking for headstones. It caused a lot of confusion, because the only way we could put them into the database was as "facts". "so-and-so resided at place at time" kind of thing.

Later on, it transpired the stories were more or less true, but were a generation older than attributed and the other side of a marraige... and then we had to work out which facts needed to be moved. Which was a pain.

However this might prove a start since the systems (or at least their users) have to deal with stuff like "Oh, and Fred was buried in Skegness. Or possibly Margate. During a war."
Katie Lucas
Wednesday, July 26, 2006
Wow... even better response than I expected.

Thanks for the genealogy-pointers, I'd never have thought of those myself. Also, the search for "patterns that change over time" eventually led me to Richard Snodgrass' homepage: which hosts his now out-of-print book: "Developing Time-Oriented Database Applications in SQL" - a book that seems to atleast cover everything relating to excact dates.

I've got a lot more info to work with now - thanks again.
Peter L. Send private email
Wednesday, July 26, 2006
... the search for "patterns that change over time" eventually led me to Richard Snodgrass' homepage...

Yep, this is what I meant with "(and at least a complete book)".
Paolo Marino Send private email
Wednesday, July 26, 2006
+1,000,0000 for the Snodgrass reference.

BiTemporal databases have saved my butt on more than one occasion. Beware, though. Most users don't think in terms of one dimension of time, let alone two (BiTemporal). You'll be best to hide as much of the "temporal nature" as possible from the users. They just don't GetIt(tm).

We'll usually architect the system to where the typical user only sees the current snapshot of the data -- what it looks like now, as of today. The history, the temporal dimensions are hidden from the user unless they go digging for it.

If you're dealing with smart folks, they'll GetIt(tm). If not ... good luck. Been there, done that -- and it ain't pretty.
Wednesday, July 26, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz