The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

back ups and database size..

hello all,
  a while back i had posted about logging users every action
and storing it for n amount of time.

  well i did implement logging to database with out any apparent performance hits. basically during the day, every action is logged as references to the data, so basically most of my data is all integers ( primary keys of other tables).

then at night, a threads parses this information and converts into plain english. this was done as description stored with those references may change over time and so the past logged history will also change with it.
of course who changed the system data is also logged.

so now here comes the problem. the size of the 2 nd table. for my biggest client, the moment we installed this, with in 10 days, the size of this table was 650 MB.

i am not concerned about the size and the performance as this table will never be loaded except for auditing and that will be likeonce in blue moon.

but what i am concerned is with back up. the secnario is i have got  thousands of installs of our product at least. and none of them are in house. its all on customers sites. and majority of them do not have DBA or even in house IT person.

so eventually back up will be a huge issue. as of today, they back it up on to a tape.

my issue is, i will have to come up with a solution that can be deployed easily for all my customers and the backing time shouldn't increase beacuse of this.

my team lead suggested, to convert the logging from database to file system and create new file for each day, which can be zipped.

i am not very keen on this, as it will take outa lot of flexi ablity i have now such as data mining etc.. and i will have to rewrite  a major chunk of code.

my idea is move those tables to another database and then customers won't have to take back up of these tables every day at least and the main database won't get bloated. but this will provide us with more maintaince head aches.

any suggestions?

thank you
Sunday, April 09, 2006
Dan, tell me if I have this correct:

During the day you store:

1 = User adds a new part to the Parts table.
2 = User deletes a part from the Parts table.
3 = User adds an invoice.
76 = User modifies the Colloquial English mod to be Concurrent English.
and so on.

Of course you are storing other information, such as the part ID, the user ID, etc.

Then at night you convert this to a narrative:

At 09:15:36.00234 user Dan Michaelson added new part 1234-2365323 to the Parts Table.


If this is truly what you are doing, I question why.  If you have all of the information required to report on the activity stored in a nice compact format (your integers), why are you then expanding it by orders of magnitude by converting it to text?

Remember that logs are seldom read.  Even if what I just said is not true in your case, remember too that you have the code already written to convert this nice compact (hence: easy to back up) data ON THE FLY into human-readable data.

Why not simply leave the data as you create it during the day, then when someone needs to look at it, create a text report with the option to save the report to file or print it?  The net effect for users is the same: they have a full audit trail that they can access.  The benefits for users are many:

- It's easier to filter codes than text, so searching in your original data is much simpler and faster.

- Backups are smaller.

I'm sure you'll think of others.
Karl Perry Send private email
Monday, April 10, 2006
I kind of agree with Karl, but your statement about the descriptive text changing over time is an issue.  Perhaps you can store the 'old' version of the text in a historical-reference table when it gets changed and use that to assemble your messages.  At runtime it's kinda messy - to build a log message you'd potentially need to look for a date-related value for a given index in two tables - but your storage needs would be a lot less, and the cost of assembling the messages is paid on demand, not every day. 

If the volumes are still large, you could split the message info storage & querying off into a separate database.
a former big-fiver Send private email
Tuesday, April 11, 2006
karl, may be i wasn't very clear on what i am trying to accomplish.

consider a lawyer writing a letter  to his client. now the actual content of letter that was sent out should be saved for n years. or from what the first time it was drafted to final edit, we have to track who all changed the contents.
now this letter can be assembled from pre configured templates.

but these templates can change over years. now if i just store the references, then the actual letter that was sent out will reflect these new changes rather than the actual content.

thats the reason we are converting it to plain english before the end of the day.

we are also tracking who are changing the templates to what but its not in the format i use , as it was developed long before we developed this project.

any way there doesn't seem to be straight forward solution  with my code rewrite in some way.

thanks every one,
Thursday, April 13, 2006
What about saving a template for a time range?  So, using your example, you would have a table that stores each template and the timerange that template was used.  Then you store the integer values in the main table, and then store a foreign key to the template table, so you have a correlation between data rows/values and template.  When you pull the data, you just apply the template used at that time.

Retrieval would be slower, but you'd be more efficient in saving values and storage space.
Another Anonymous Coward
Wednesday, April 26, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz