The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database schema design

I'm developing a database that will archive the article headers from various USENET newsgroups. USENET news servers have a tree hierarchy where each server holds 1000's of newsgroups. Each newsgroup in turn holds 100's to 100,000's of article headers. I am wondering if it makes sense to have 1 headers table to hold all the headers from the various newsgroups, or to have a headers table per newsgroup?

I am just starting out in database development, so I don't have a good intuition as to which way will perform better. Is it better to throw all your data into one huge table, or have numerous smaller tables? I expect the types of operations performed on the headers table to be split 50-50 between insert/delete and reads since new articles appear on newsgroups all the time and servers provide limited retention of content.

I'm using Advantage for my database, configured as a local server, if that makes a difference. Any pointers on database scalability would also be great. Thanks.
Geronimous Send private email
Thursday, February 09, 2006
I think you better buy a copy of "Practical Issues in Database Management" by Fabian Pascal.

Archangel of Data
Thursday, February 09, 2006
The general rule of thumb is that one table with a key is far better than one table per type. If there were large differences in the format from table to table there may be a case but if it's the same table design it ain't worth it.

Databases are designed explicitly to work with large datasets thats their whole reason for being. Will you want to do any queries across different sets of data if so the multiple table approach will make this a nightmare.
Andrew Gilfrin Send private email
Thursday, February 09, 2006
Since row-level locking is the norm, there is generally no reason to have multiple tables with identical schema. (I can think of some cases where there are, but they don't apply here).
NetFreak Send private email
Thursday, February 09, 2006
The numbers you are talking about are not large so you can get away with one table.

As the amount of data increases partitioning, like you mentioned, is one of the primary scale out strategies.

MySql has a beta partitioning feature now, though it only works across disks and not nodes.
son of parnas
Friday, February 10, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz