The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

One big database or multiple files to store documents?

I am programming software for processing data from a poultry scale (serves for manual weighing of live birds).

The weighing on a farm is made weekly; the birds are separated in several halls/rooms, which correspond to individual files in the scale.

Once the weighing is done, all data is downloaded to a PC for further processing. Here, the user wants to see a growth curve for a selected hall created from several weekly weighings. First, the user selects what data to take and calculate the growth curve from.

Now I don't know if I should use one database to store all data from all weightings, or if one weighing of one hall should correspond to one physical file on a harddisk. Are there any general rules that I could follow?

Every weighing session can be identified by a scale name and date/time when the weighing was done.

Using one DB seems to be a better solution, because selectiong data for the growth curve is simple. On the other hand, data cannot be sorted into directories, some problems can arise when date/time is changed, export is needed for data exchange etc.

When using multiple files, data can be sorted to any user's liking, but selecting files for a growth curve can be not as easy.

Any ideas?

Thank you,
Petr
Petr Veit
Friday, February 29, 2008
 
 
I'd say use the DB.  You can export from all DB formats, and you don't have to worry about the order you store the data in.  You can have a lookup table for each hall, then flag each record in the DB with what hall it belongs too.

Why would a date/time change on a weighing?  Once a bird is weighed, that data should be static, right?
Eric D. Burdo Send private email
Friday, February 29, 2008
 
 
"Why would a date/time change on a weighing?  Once a bird is weighed, that data should be static, right?"

Eric, OP said live birds.

"Tom, please step on the plate here.  No, no, no.  The pressure plate.  The dinner plate is later, and we will handle that one."

Sincerely,

Gene Wirchenko
Gene Wirchenko Send private email
Friday, February 29, 2008
 
 
+1 on using the single database.  If I understand your problem, it will be much easier to export the data into alternative formats than it would be to combine the data for analysis from X number of files.

That actually sounds like a fun project to me.  Best of luck with it.
fair to middlin Send private email
Friday, February 29, 2008
 
 
"Tom, please step on the plate here.  No, no, no.  The pressure plate.  The dinner plate is later, and we will handle that one."

Stop!  You're killing me here!

lol
Karl Perry Send private email
Saturday, March 01, 2008
 
 
More seriously, if your user ever wants to graph/compare more than one hall, or species, or gender, or feedstock, or anything else it will be much easier from a single database than from a series of files.

Gene, I think Eric was questioning why the date of one weighing would have to be changed, not why a bird would have to be weighed more than once.  Unless someone accidentally changed the date on an individual scale, there should never be a reason to change a weight record coming in from an automated source.

I'm still chuckling over your different plates.
Karl Perry Send private email
Saturday, March 01, 2008
 
 
A single database would suffice. Sounds like less then 25 tables and it would simplify the structure.

SQL Server supports cross-database select queries. Keep in mind that you might want to upsize one day :)
Eddy Vluggen Send private email
Saturday, March 01, 2008
 
 
Arguably a file system IS a kind of database. Why bother making the distinction? Look at all the likely database engines together with all the available file systems and then choose from this collection the solution that best fits your needs.
Rowland
Saturday, March 01, 2008
 
 
"Arguably a file system IS a kind of database."

Not a relational one, though. Nor does it support SQL, so there's an extra investment in time if you want to calculate the average weight.

SQLCE, or a simple data structure in XML. MSDE or MySQL might be overkill.
Eddy Vluggen Send private email
Saturday, March 01, 2008
 
 
Thank you all for your comments!

The growth curve will be created from about 5 weighings, which would mean to select 5 files.

What I am worried when using one big DB:

1) The date/time of weighing is given by the current date/time set in each scale (data can be downloaded later). What if the scale has date/time set wrong? I thought if the files were used, the user could just create a directory with right date/time and download data from all halls there. The same problem comes when one scale has wrong name. Maybe I could let the user enter (or correct) the date/time and scale name prior to downloading. Also, there could be maybe some batch correction applied to selected data in the DB.

2) When the individual files are used, the user can download data from the scale several times, in several directories. This could be useful when the data must be duplicated for some testing purposes.

I think using one DB will be really better, I only have to get used to it. So far, I've always used multiple files.

Petr
Petr Veit
Sunday, March 02, 2008
 
 
Petr,

It sounds as though the data is being physically carried via floppy, or some other kind of batch process from each building/scale to the data center.

Let's assume that the format of the data coming in from each scale is the same.

Given that, you write an application:

1. Ask the user the name of the scale from which the current batch of files is being imported.  This is saved for each weight as the scale identifier, not what's in the incoming file.

2. Ask the user for the date/time the weighing started for this batch.  They will have to write this down as the weighing starts.  This is saved in the variable dtWeighingStart.

3. Your app finds the earliest weighing in the file and saves it in the variable dtScaleFirstWeight.

4. For each weight imported, the weighing date/time you save is dtWeighingStart + (CurrentWeightDateTime - dtScaleFirstWeight).

This gives you a common time basis for all imported weights but also gives you the exact time each bird was weighed - in case that degree of precision is important.

I think you still want to save this in a relational database, not separate files on disk, for all the reasons everyone above has stated.

One thing that no one has mentioned is that using a database stored on a server, such as SQL Server, will prevent an idiot or malicious user from deleting the files - you can lock down the database file's folder on the server while still allowing users to access the data stored in it.
Karl Perry Send private email
Sunday, March 02, 2008
 
 
I'm going to recommend that you read this book:
http://www.cs.arizona.edu/~rts/tdbbook.pdf
It is out of print, so the author has made it available as a pdf on his site.
Peter Send private email
Sunday, March 02, 2008
 
 
Karl: yes, one scale is used in several halls at one time and then it is brought to the office, where the records are downloaded. Of course, there can be several scales on one farm, each weighing in different set of halls.

It looks like letting the user to change everything (scale name, date/time) will be the best solution.

I would like to use the Access DB. The application will be run strictly in Windows as a local app. I have not very much experience with SQL, so Access seems the easiest DB for me (together with .NET WinForms).

Thanks again,
Petr
Petr Veit
Sunday, March 02, 2008
 
 
"Tom, please step on the plate here.  No, no, no.  The pressure plate.  The dinner plate is later, and we will handle that one."

Stop!  You're killing me here!

    No, I am not!  That comes later, ah, in version 2.

Sincerely,

Gene Wirchenko
Gene Wirchenko Send private email
Monday, March 03, 2008
 
 
From the perspective of manipulating the data, I too favor the database.

However, since these are physical scales being carried around, and presumably prone to abuse, it may make more sense to create the data in multiple flat files, and then import them into the database when the scales are synchronized with the base station.

For example, let's say a chicken accidently kicks out the power cord on the scale, or happens to squat in just the right place and (ahem) insulates the battery. It's easier to reconstruct the data on a file by file basis, especially when they're simple CSV files, than it is to dive into Access and try to figure out which records are corrupt.

Does Access gracefully recover from interrupted transactions?
TheDavid
Wednesday, March 05, 2008
 
 
>Does Access gracefully recover from interrupted transactions?

Not really. I don't know about the latest version, but when I was using JET4, there are transactions, but no transaction log. As long as everything is up and running, commits and rollbacks work great, but if the plug is pulled, the database can be left in an inconsistent state without any means to rollback interrupted transactions. With all the open source databases available today (MySQL, Firebird, Postgres), there is no need to feel limited to Access if one is on a budget. I would avoid Access if at all possible.
anony
Friday, March 07, 2008
 
 
I would agree with the one big database.
Adam Cox Send private email
Wednesday, March 19, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz