The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database stored in a .dat file


I need to create a database holding nearly 40,000 records. Each record has 4 fields - 3 longs and one char.
It will be used only as a lookup table and no data is manipulated in any way. If x matches field[0] then return each field in that record as an array found[i]. I need to distribute an updated copy monthly by simply overwriting the old one.

SQL CE creates a 2meg table and is gross overkill.

I have not had any experience in this field and hope that someone can tell me what I need so I can at least do a search on the net to try and find some tutorials and such.

I code in C# and use VS2005.

Frankly, I don't even know what I should be looking for?

Glen Harvy Send private email
Saturday, July 07, 2007
unfortunately you are using C#, so it is already overkill, otherwise I would recommend SQLite which is perfect.
Saturday, July 07, 2007
I think any sort of database be it SQL SE CE Lite or Stupid, Access, mySQL etc is overkill. I only mentioned C# and VS2005 because that's the envoironment I need to learn how to do it in.

I'm looking for what I guess is a simple binary format data file that stores the data in as little space as possible.

It's the HOW-TO on reading, storing and retrieving the data that I am looking for.

But I could be wrong - I have been known to be in the past :-)
Glen Harvy Send private email
Saturday, July 07, 2007
Are you aware of SQLite, that it is a single file, and truly light?
Saturday, July 07, 2007
Your uncompressed raw data without overhead is already 1MB.

Assuming a 64-bit long (which is C#'s standard - in the past ints have been 16 with long as 32 in some platforms)

3 * (64-bit long) + 1 * (8-bit char) * 40000

= 1000000 bytes

Depending on what indexes you chose, you might have made some more overhead in SQL (I would have thought the best strategy is a clustered index on everything, and that by this it would be closer to the 1MB size than the 2MB size, but the 2MB may include free space in the database and database overhead, not used table/index size).

Assuming that 1MB is acceptable, and searching is on the first long and there are no duplicates, provide the file as straight binary sorted on the first long and use a binary search to find the entry on demand.

If more performance is needed, you can do caching or whatever.  If the size is unacceptable, you can look at shortening the fields, but I can't recommend looking into compression since your rows are so small (although longs might be overkill, depending on your application.)
Cade Roux Send private email
Saturday, July 07, 2007
The C# way is to use a binary serializer. Here's a reasonable description I found through searching "c# binary serialize":

Of course, like a lot of OO stuff, that is drastic overkill. You could always open a binary file, iterate over your array, and simply read/write values.
Saturday, July 07, 2007
As far as guidance on the implementaio:

Implementing the binary search on the file can be done by opening the file, determining from the size of the file the number of records, seeking to the midpoint record.

Lots of debugging work there to ensure your implementation is right, so the SQL may be more attractive from a construction and maintenance point of view.  I'm not aware of an off-the shelf component to handle searching a file of plain binary records.

If your working set is not an issue, C# has a number of collection classes, so if the data was read into such a structure (hash-table), then efficient in-memory searches already exist.

So there are going to be tradeoffs...  2MB might not look so bad, now...
Cade Roux Send private email
Saturday, July 07, 2007
rand - Thats where I think I wanted to go:

It might be overkill but it's what I wanted to see and learn so I'll dig into that option.

"You could always open a binary file, iterate over your array, and simply read/write values."

Now your smokin!

Any ideas where I can find some info on how to do that - in particular how to insert/retrieve the data into an array? Everything I have refers to text files.

I could do it with text files now (meaning I have that level of experience) but I believe binary files will be considerably smaller and the process faster.

Thanks everyone - getting there.
Glen Harvy Send private email
Sunday, July 08, 2007
Sunday, July 08, 2007
I failed to see any reason that a database should be used for this matter. 1~2 Mega byte data is not a big deal no adays. Just load all of them into a hash map.
Sunday, July 08, 2007
1MB is nothing, you do not seem to need SQL..

The user seems not to change the data.

A binary file, loaded into memory when the app is started looks like a good choice.

(Even if you don't use any fancy data structures, iterating over an array of 40,000 records of simple data should be fast)
Totally Agreeing
Sunday, July 08, 2007
There are read only DB's designed for this sort of thing, I can't recall the name just now. You could also look at memory mapped files, or just useserialization as already mentioned.
Neville Franks Send private email
Sunday, July 08, 2007
The OP implies that he is working on a mobile platform - 1 or 2MB would make a big difference, either in storage OR in memory.
Cade Roux Send private email
Sunday, July 08, 2007
Berkeley DB?

Basically, it's the underlying engine behind things like MySQL that lets you store arbitrary things in tables and manages the accesses for you. There's some licence jiggery-pokery to work around.
Katie Lucas
Monday, July 09, 2007
If you are using .NET, and this is going to be just a lookup table, why don't you use a hashtable?

You'd have to create your hashtable and then serialize it to disk.  You application will have to deserialize it and hold it in memory.  Shouldn't be too big of a footprint - but couldn't tell you exactly.
Monday, July 09, 2007
Even more simplistic ... why not use XML? .NET supports it natively, and you can use XPath to give you SQL capability. Depending on the data, you definately can keep the file size under 1MB if necessary.

XML can easily grow with your needs as necessary as well, and if you use schemas properly, you can even save space by removing tags for empty fields.
Inari Send private email
Monday, July 09, 2007
I'm not sure, but I think that Donal Knuth's book on searching and sorting has a chapter on search trees that you could use as a basis for an indexed or hashed binary files. At least, I remember we did it as a class project in the university.

If you are just bulk loading data for later retrieval with minimun changes (updates, deletes), implementing this is a snap. BUT, when data starts changing, things will become really messy (many conditions to check, need for reorganizing data, etc.)

What is not clear for me is how complex your queries are going to be (or could be in  forseeable future). If there is ANY chance of complex queries, consider a lightweight data base engine as Berkley DB (I've not used it personally but seams to be the best option outhere).

Pablo Chacin Send private email
Tuesday, July 10, 2007
Be very careful if you chose to serialize the stuff. You're going to learn the painful way that version changes will alter the serialization, unless you are very careful.
Peter Send private email
Tuesday, July 10, 2007
If you calculate the raw data it is exactly 1000000 bytes.  The XML will only be less than 1MB if the numbers he calls long are significantly less on average than 8-bytes in ASCII form and the tags are very, very short.  I'm not even sure you can get it under 1000000 bytes with one-character tags and every long being 0.  And if all the "longs" are really short, the binary form will come down significantly using 32-bit integers or short ints or even bytes instead of 64-bit longs.
Cade Roux Send private email
Tuesday, July 10, 2007
This screams for flat files with an in-memory data structure.

If memory is at a premium, split the file in 4-10 ways based on whatever partitioning scheme you want (first 2-4 bits of the index, etc) and just load and seek through 10k to 4k records every time you want to find something.  That is possibly a very acceptable tradeoff if your accesses are infrequent.
Patrick McKenzie (Bingo Card Creator) Send private email
Friday, July 13, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz