The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Lots of history in DB table

What is a good way to deal with historical data in a database where there are many rows inserted and many rows deleted?  We want to delete data from the database when it is more than one year old.

My concern is the performance of having a table with about 10,000 rows of data per day inserted and deleted.  Our system might scale up to have ten or twenty times that much data.

Would you have a series of tables where you write the data?  It is much faster to truncate a table than it is to delete rows from a table.  If you use this approach, how do you design the reports that use these tables?

What other approaches should we consider?
Thursday, November 17, 2005
The approach I would be inclined to use would be to first define in terms of time what constitutes "historic" data. Once defined, proceed to create the historical data by way of disaggregating your active table into two separate active and one historical.

Your reports should be disaggregated in a way that allows the end user to easily identify which kind of report is being perused.

Managing changes within the database serve as a completely different that must be managed in a similar way as aforementioned depending on the objectives defining how you want to manage the data.
Brice Richard Send private email
Thursday, November 17, 2005
What database? Oracle Enterprise Edition (no, not cheap) supports partitioned tables for just this case.
George Jansen Send private email
Thursday, November 17, 2005
I would also ask questions about when the deletion takes place.  1,000 records added per day is reasonable.  1,000 records deleted per day creates lots of headaches for the database administrators, particularly when users' accidently delete a single record.

Since you allow your solution to scale, it seems reasonable to consider a quarterly or yearly purge of historical records. In the financial industry, I believed you're only required to keep records going back seven years. From an efficiency standpoint, it doesn't make sense to delete records that are exactly seven years old, every day.  It makes much more sense to delete all records that are at least seven years old, at the end of the quarter (anywhere from seven years, zero days old to seven years, 92 days old).

Additionally, from a human perspective, this allows you to have a much publicized "commit" date.  Tina in accounting has until the end of the quarter to fix her mistakes.

In more practical terms, this also allows you to schedule a batch job during the wee hours of the night.

If you have a variable number of rows, then I would strongly discourage capturing the records in a variable number of tables. Getting it all to work properly is not worth the time and effort you'll put into it.
Anonymous Coward
Thursday, November 17, 2005
Our application does not have critical data such as financial information.  It is recording when material is received, produced, moved, and shipped within a warehouse. The old data will be deleted by a background process that runs periodically, not directly by users.

The system is expected to run continuously, with no down time. A cleanup process that takes so many resources that it impacts normal use would be a problem.  A quarterly cleanup would probably impact things too much.

I like Brice's suggestion of active and historical tables, but wonder if that just adds to the overhead. Data would get copied from the active table to the history table, then would be deleted from the active table. Then old data would still need to be deleted from the history table eventually, so I don't see how that solves the problem.

I've used Oracle's partitioned tables before for this kind of thing.  I was just wondering if there were other cheaper workable solutions that are not so DB vendor specific.
Thursday, November 17, 2005
> Oracle Enterprise Edition (no, not cheap) supports
> partitioned tables for just this case.

If I am not misstaken, partitioning must be additionally licensed even if Oracle Enterprise Edition was purchased.
(Yes, even more expensive)
René Nyffenegger
Friday, November 18, 2005
The database you are using here is critical.  This is trivial in Oracle as mentioned above with partitioning.  You have something called rolling updates, where you simply insert new records in the current partition and drop the oldest partition.  You essentially don't actually do any deletes; the entire partition is removed in a single, quick operation that happens fast regardless of the number of records, 10,000 or a million.

You also then get the advantage of keeping everything in the same table without performance problems.  Oh and you would compress the older partitions.

Can't speak for other databases.
Scot Send private email
Friday, November 18, 2005
10,000 rows added or deleted per day is a trivial amount, assuming you have a suitable server and database.

Try doing telecoms billing, with millions of rows per day, then it can get more interesting ;-)

Seriously, a few thousand rows shouldn't cause too much trouble, and as others have said, "proper" databases, like Oracle, have partitioned tables for this scenario.
Wanted - NYC
Friday, November 18, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz