(Not logged on) | Register | Log On

You can subscribe to this discussion group using an RSS feed reader. The Joel on Software Discussion Group

A place to discuss Joel on Software

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

databases with many (>20) tables

In the Ruby on Rails Rant topic, bza1e1 wrote

"Hey, can somebody, why one could need hundreds of tables? I've heard there are some internal industry management app beasts that have tens of thousands of tables. What for?

I can't think of any application, where i would need more than 10 or 20 tables."

Well, here is a real-world example, as best as I can recall: I worked for the company that develops, operates and manages the EZ-Pass sofware for a couple of states on the east coast and our database easily had 100 tables (probably more like 200). While the database wasn't the best design I have ever seen, I seriously doubt we could have made do with fewer than about 50 tables.

Here is a list of the sorts of real-world objects and processes that the database had to keep track of:

* customer account
* customer address
* customer billing
* customer payments
* customer complaints
* customer notices
* customer vehicles
* customer suspension
* business customers
* business address
* tag status
* lane transactions
* lane/plaza config
* employees
* employee permissions
* employee passwords
* employee shifts
* money bags
* inter-agency payments
* inter-agnecy tag status
* tag returns and replacements
* agency report generation

Now, imagine that each entry on that list required an average of three tables in the database (many probably only needed one table, but a few needed quite a few tables, I think the average was around three), which means that we would have a database of 66 tables. Now, also imagine that I've forgotten a bunch of functions and that there are another bunch of functions that I never worked with so I didn't know about, and we can see that the system would require at least 100 tables.

Many real-world systems are attempting to tie together multiple parts of large complex processes (in the case of EZ-Pass: customer billing + vehicle tolling + toll plaza management + toll collectors + inter-agency stuff + agency reports) and each part requires at least 20 tables. More tables may be required just to coordinate between the parts.

It is difficult to construct a toy example (of the sort you might encounter in college) that can well reflect these real-world problems. It's probably not even desirable (students wouldn't be able to wrap their brains around the example in only 15 weeks, much less learn anything from the exercise) in most cases, but it IS how the real world works. There may be lots of little probjects that don't require more that 20 table, but there are a fair number of REALLY BIG projects that do require hundreds of tables (and hundreds, if not thousands, of programmers, praise be to whichever-god-you-like for lifetime employment opportunities).

I'm not able to offer any analytical reason for this sort of vast table requirement, except for the qualitative argument given above. If anyone else can shed more light on the subject, I'd like to hear it.
Jeff Dutky Send private email
Thursday, February 16, 2006
 
 
Thanks, I will save link to this post for someone else who says that he doesn't see a need for more than 20 tables. Someone making that statement obviously hasn't worked in IS department of big companies or governments.
JD Send private email
Thursday, February 16, 2006
 
 
How about this: The senior R&D manager at a large company I used to work at insisted we needed no database at all. He said we should store customer and transaction records in flat text files. This guy was a PHB on roids. A complete moron who got into management by kissing every ass in sight. Yet, he was the boss.
MBJ Send private email
Thursday, February 16, 2006
 
 
I have just set up a relatively simple Web site for a customer, and it has 21 tables.

A banking application I used to work on once easily had 50+ tables.
Berislav Lopac Send private email
Thursday, February 16, 2006
 
 
It is typically based on the number of different "types" of data in your system. I work on Point of Sale systems so we have tables for customers, items, transactions, employees, and a mirad of other "entities". Systems with less than 20 tables could have no more than 20 distinct entities in the system. That is a pretty minor system indeed (or at least a system that is targeted to a very specific function).

We have 108 tables in our Point of Sale system and that is actually pretty low compared to what I have seen in other systems. 200 tables is not uncommon for a full featured Point of Sale application.

Just to put it into perspective, Microsoft Small Business Accounting 2006 has 200 tables.
Turtle Rustler
Thursday, February 16, 2006
 
 
I also think that the number of tables used by applications is probably declining slightly with the widespread adoption of XML for configuration information. We used to keep all configuration within tables in the database. For example, one Point of Sale system I worked on had about 300 tables with over 100 being dedicated to configuration/personalization settings.
Turtle Rustler
Thursday, February 16, 2006
 
 
I've worked for a health care company that has so many tables that their ER diagram fills a 3" binder. I'd be surprised if they had fewer than 4 digits in their table count.
Anon to protect the guilty
Thursday, February 16, 2006
 
 
Most of the applications I maintain have 15-20 tables, but I have a custom CMS with 35.
clcr
Thursday, February 16, 2006
 
 
Also, I've seen (but not maintained) two databases with over 100 tables each.
clcr
Thursday, February 16, 2006
 
 
I work on an app with 50+ tables
Sassy Send private email
Thursday, February 16, 2006
 
 
i have to maintain an "app" with 1 table. but it has 473 fields.
not your everyday knob
Thursday, February 16, 2006
 
 
My web game, carnage blender, has 223 tables.  And that's a one-man show, not a huge project.

20 tables?  Man, that's for toys. :)
Jonathan Ellis
Thursday, February 16, 2006
 
 
I regularly work on a survey reporting system, hierarchically based, with support for adding action plans to the system, which easily has more than 20 tables. There's a lot of relationships to work with. I don't think it would be possible to express it in under twenty tables.
John Christensen Send private email
Thursday, February 16, 2006
 
 
Trivia --

A company I worked for implemented Oracle Financials.  The guy in charge of it told me once that it had 6,800 tables.

I went looking for some evidence of this before I posted here, and I found a couple references to Oracle ERP Applications with all modules have between 10,000 and 20,000 tables.

I did one small project on it, and I can tell you that it's relational like you wouldn't believe.  The DBAs wrote the SQL for me, and to get a simple list of GL accounts, they presented me with an **11 table join**.

I read "Softwar" a while back about Larry Ellison, and it said that one of the huge delays in getting Oracle Applications into the market was that Ellison demanded that every single conceptual object in Oracle Applications is capable of storing multiple values, so every relationship is many-to-many.

"Multiples" they called it.  Everything had to store "multiples."
Deane Send private email
Thursday, February 16, 2006
 
 
I worked for some years on a project that was too big. They had jammed all data for hundreds of apps in one DB so there were 1300+ tables.

Also, they refused to use the relational features of the relational DB! So, no foreign keys - all triggers...

The DBA once told me he could account for about 300 tables, but didn't know about the rest.

My feeling is, the system was poorly designed from the beginning and got worse over time (as happens usually). And nobody, ever, decided to slow down and clean up the mess.

So, tables were designed with multiple relationships - for example shop-provider-item ending up with 50.000.000+ rows!

To speed up the system, bigger servers were bought.

I sure am glad to be out of that one.
GUI Junkie Send private email
Friday, February 17, 2006
 
 
Depends what you consider an "app" really. "addressbook" probably doesn't 20 tables.

"online bank", "financial exchange", "international logistics provider"

Ahhhh.... those ones tend to be huge sprawling beasties.

The databases written by the "who needs more than 20 tables" people tend to suck horribly both for users and maintainers because instead of relations, they copy data (or make the users copy it, which is even worse).

And if you look carefully, you usually find a table which is storing name/value pairs for all the things they couldn't store without violating the 20 rule.

In the general case, the 20 tables rule is big red warning light with a siren attached. If it's a junior says it, give them a year to stop saying it and fire them otherwise. If it's someone senior, jobhunt. Quickly.
Katie Lucas
Friday, February 17, 2006
 
 
"Systems with less than 20 tables could have no more than 20 distinct entities in the system."

This is not really true -- you can have a flexible system utilising only four or five database tables to represent an unlimited amount of entities. Though it has other disadvantages, such as performance.

Scott Ambler got it all pretty much covered here: http://www.agiledata.org/essays/mappingObjects.html
Berislav Lopac Send private email
Friday, February 17, 2006
 
 
Large SAP systems, competing with the oracle apps products, have thousands of tables as well.
smartin Send private email
Friday, February 17, 2006
 
 
640K ought to be enough for anybody.
Bill Gates
Friday, February 17, 2006
 
 
People who make retarded statements like that (and who evangelize things like Ruby on Rails) are usually pretty ignorant.

Most of these guys know nothing about databases beyond the MySQL url and database connection string. If you try to talk to them about things like the relational model, constraints, etc they tell you about how 50-row tables are more efficient that normalizing data.
Duff Send private email
Friday, February 17, 2006
 
 
You are right Berislav Lopac. I hadn't thought about tables that are mostly key-value pairs and such. We actually have a couple of tables like that in our system as well.

Good catch.
Turtle Rustler
Friday, February 17, 2006
 
 
One of our customers ERP database has around 900 tables now but I'm not sure how much of it they are really using, probably around 300~400.
Rafael Rodrigues Rezende Send private email
Friday, February 17, 2006
 
 
Man, you guys are wimps (:=). Each clinical trial is its own source system, each with about 30 or so distinct entities (in either 1 or 3 tables, identically structured). We had about 250 trials in production. An untold number in legacy. Do the math.

Add to that, in production, we would add a source system a week. Finally, once in production, the metadata wasn't stable; it could change at anytime (they are experiments, after all)
Steve Hirsch Send private email
Friday, February 17, 2006
 
 
"The senior R&D manager at a large company I used to work at insisted we needed no database at all. He said we should store customer and transaction records in flat text files."

Well, basically, he is right, and even the database usually stores the data in files, so why not save the database, it is only useless overhead...

Put enough RAM into the machines, then the whole flat text files could be completely loaded on application start and completely saved when you are done. Of course, the users have to coordinate themselves to allow multiuser access, only one at a time.

Additionally, it is perfectly possible to reduce any database to one single table, by partitioning it so that any "table" only uses its own columns, and anything is manually maintained, something like this (as CSV):

1_key, 1_name1, 1_name2, 2_key, 2_location
1, your, name, NULL, NULL
2, another, name, NULL, NULL
3, additional, names, NULL, NULL
NULL, NULL, NULL, 1, here
NULL, NULL, NULL, 2, there

I really don't know why it isn't done this way. It makes anything so easy and all informations are in one place... ;)
Secure
Friday, February 17, 2006
 
 
Strange; Access programmers are normally considered the lowest of the low but at least we know about normalization, which appears to be a totally foreign term to all those who are talking about a maximum of twenty tables.

Basically every time you have a lookup field, which will show up on the data entry form as a drop down box, you are going to have a separate table that the lookup field refers to.

Most sample databases in training books have around half-a-dozen tables. If I do a database for a quick one-off project it is likely to have around the same number of tables. If I decided to do a unified database for a small language school it would probably have thirty-odd tables.

Just think of a simple CD catalog for private use. In order to avoid mistakes in data entry you would need a separate table for artists, music type, publisher as well as the main table for the individual CD. If you are selling the CDs then you will need another half-dozen tables for suppliers and customers and finance. If you employ a couple of people you are already up to the twenty mark.
Stephen Jones Send private email
Friday, February 17, 2006
 
 
MBJ wrote:
"How about this: The senior R&D manager at a large company I used to work at insisted we needed no database at all. He said we should store customer and transaction records in flat text files."

Actaully, I used to be a proponent of this philosophy, until I tried to write a library to support the proposition. I think a lot of people become disillusioned with existing DBMSs and adopt the polar oposite position in response: 'since all the instances of DMBSs I have seen are crap, and can be outperformed for some tasks by simple flat files, all DBMSs must be worthless and we aught to just use flat files instead.' Others might be responding to a certain amount of DB-fanboyism prevalent in many IT shops: I've run into people whose only response to any problem was to throw a DB at it, even when all the problem demanded was a flat file, or even just an in-memory array!

There are some advantages to flat files, especially if the task you are trying to do results in repeated table scans (then you might as well cut out the middle man, dump the data to a flat file and march through it in straight C with a tiny bit of buffering/read-ahead), but if you need to coordinate multiple types of access (mixed query and insert/update/delete) and simultaneous access, you can't do too much better than one of the big DBMSs (Oracle, Sybase, Postgres, SQL Server, etc.) and you might as well save that effort for something else.

At the same job that I mentioned in my original post, we started out with a series of flat files representing lane transactions (cars driving through toll lanes on highways, bridges and tunnels) which were progressively processed until a final stage when the last file in the pipeline got inserted into the database. This was a pretty good design for the system: if any stage in the pipeline had a problem you could simply examine the output file and log file for that stage, isolate and fix the problem, and rerun the stage on the failed input file.

Unfortunately, we had a few DB-fanboys high in the ranks of management who were always pushing to eliminate the flat files and move everything into the database. Eventually they got their way and I had to rewrite the 'ingest' process to spit lane transactions directly into the database, where all further processing would be done. The problem  with this design is that, when some processing stage fails you have to go mucking through the database to isolate the input and output values. Worse yet, in most cases, the input values have been overwritten by the faulty process and all you have now are the logs (which rarely have all the data you need) and the faulty outputs. Your only means of recovery is to roll the entire system back to the state before the now-corrupt data entered from the lanes and rerun EVERYTHING up to the faulty process.

Immediately after I made that change I transferred out of the host development team into the lane software team so I would never have to deal with that kind of failure.
Jeff Dutky Send private email
Friday, February 17, 2006
 
 
My former employer was writing a criminal case management system.  We had the criminal code, calendaring info (judges, law enforcement, inmates), sentencing info (actual jail, time served, probation etc), financial info (fines, restitution, external agency payments, the Court GL, etc), hearing records, interfaces to law enforcement, plus the linking info for all of that.  400+ tables.

My current employer's primary package does revenue sharing (royalties).  We allow the users to define their own identifiers (up to 20; hierarchies or groups available), need to capture sales & adjustments info, GL output coordination, there's a batch job scheduler/processor, hardcopy statement output and the royalties calculations themselves. 700+ tables.

20 tables wouldn't cover one subdomain in either of those apps.
a former big-fiver Send private email
Friday, February 17, 2006
 
 
"I can't think of any application, where i would need more than 10 or 20 tables"

This is as ignorant a statement as ...

"I can't think of any application, where i would need more than one or two thousand lines of code"

It must come from the pen of a person who has never written serious software.
David Aldridge Send private email
Friday, February 17, 2006
 
 
"My former employer was writing a criminal case management system."

It was that bad?

:o)
Berislav Lopac Send private email
Friday, February 17, 2006
 
 
Two of the systems I've worked with in the last 3 years have had more than 800 tables each (one was 950 and one was 820).  Some systems aren't simple, and that's fine.

Oh, approximately 50% of those tables were 'lookup' tables.
Lou Send private email
Friday, February 17, 2006
 
 
A vanilla Deltek CostPoint system: 2727 tables, 294 views. A PeopleSoft Financials system with by no means all modules, 1500+ tables, 1200+ views.
George Jansen Send private email
Friday, February 17, 2006
 
 
>>"My former employer was writing a criminal case management system."
>>
>>It was that bad?

almost ;-) 

OT::it's an interesting situation when one of the primary users really can put you in jail when the system doesn't work the way they want it to...
a former big-fiver Send private email
Friday, February 17, 2006
 
 
I would be more interested in hearing about the justification for thousands of tables.
  Some of the bad things you see on www.dailywtf.com are where things are so wrong that they create a new table for every customer, or a new table every day, that sort of thing.

Taking a look at the EZ-Pass as an example.
  "...each entry on that list required an average of three tables...". Three for each ?
  Wouldn't a business also be a customer. Maybe a business owns multiple cards, but couldn't any customer own multiple cards; one for the car one for the truck. Wouldn't the relationship be that each card points to a foreign key in the customers table as the user relation.

Then another person says
"They had jammed all data for hundreds of apps in one DB so there were 1300+ tables."
  Which makes me wonder if this isn't part of these big table counts. Should they really be multiple smaller databases.
always learning database management
Friday, February 17, 2006
 
 
>>I would be more interested in hearing about the justification for thousands of tables.

Configurability, for one.  About 25% (~150 or so) of the tables in our product are metadata that define the behavior of different parts of the app and allow our client companies to customise what they see and report.  It's even more pervasive in systems like SAP, with literally hundreds of functional modules, which try to be configurable for absolutely any business type. 

Correctly-defined relationships for another.  In our system Persons are a type of Entity, since we can have Person-payees and Company-payees.  Whether something is a Person or a Company is an attribute of Entity and that attribute helps define allowable other relationships and application behavior.  Sure, we could expand the Entity table to allow for the superset of Person and Company data, but it makes more sense (to us, anyway) to separate them and join the data correctly.
a former big-fiver Send private email
Friday, February 17, 2006
 
 
==> Should they really be multiple smaller databases.

I'm a fan of TheOneTrueAndGiantDatabase. There is nothing worse than trying to pull data from 32 different databases, some of them different versions, and some even different vendors. A single, centralized repository that is the OneTrueSourceOfAllData is a much, much better way to go for the types of organizations we deal with on a regular basis. So ... if you've got 20 or 30 apps that hit the database, and each having several dozen tables, getting up into the hundreds or even thousands of tables is not that difficult to do.

Think of it this way: Why should I have to enter my Customer table's data into my accounting app, my project management app, my newsletter mailing app, my contact management app, my christmas-card-mailing app, my complaints app, my order status app, my inventory management app, my call tracking app, my legal/contracts/compliance app, my fulfillment app, my billing app, my ... -- you get the picture. This world sucks, and it sucks BigHairyDonkeyBalls. The OneTrueSourceOfAllData simply works out far, far better in my world. So ... you end up with data for a large set of (barely) related applications in one single database. It works, and it works well for us.

For me, a large database is on the order of 300 to 400 tables, but I can see it easily creaping up into the thousands of tables.
Sgt.Sausage
Friday, February 17, 2006
 
 
I honestly can't think of an application I've worked on professionally which had LESS than 20 tables.
Rhys
Friday, February 17, 2006
 
 
SQL> select count(*) from user_tables;

  COUNT(*)
----------
      1148

SQL> connect ***/***
Connected.
SQL> /

  COUNT(*)
----------
      357

And that's just two modules of many in a relatively small ERP/MRP system.  I've seen much larger.  Why are there so many?  It does lots of stuff!  It needs to be able to track the breakdown of the cost of each lot manufactured, when they expire, when all of their materials expire, when they've been tested, how they did, what was bought, what was sold, how it was discounted, where it is, some way to verify that with a physical count, archive it, track the lots of the suppliers, and on and on anon.
why remove my post? Send private email
Friday, February 17, 2006
 
 
"1_key, 1_name1, 1_name2, 2_key, 2_location"

Is that a "sparse database"? ;)
Philip Prohm Send private email
Friday, February 17, 2006
 
 
Define 20 tables each with 1000 columns.

Use some of the tables for metadata about what is in the other tables, then define the other tables as field1, field2, field3...field1000.

Voila!  Everything holds in 20 tables no matter how varied the data is.
T. Norman
Saturday, February 18, 2006
 
 
"I'm a fan of TheOneTrueAndGiantDatabase. There is nothing worse than trying to pull data from 32 different databases, some of them different versions, and some even different vendors. A single, centralized repository that is the OneTrueSourceOfAllData is a much, much better way to go for the types of organizations we deal with on a regular basis."

+1. It is valid for any form of data storage.

Which one is better?

The Windows approach with the drive letters, or the Linux approach with anything transparently mounted under a single root?

A p2p-based working, with any user storing his files on his machine, even holding duplicates in different versions from other users, or a central file server as the reference data storage?

A lot of different data access protocols with its own names encoding formats, with paths separated by different characters path/path/file, path:path:file, or a unified URI-scheme that includes the protocol file: http: ftp: to address a specific file?
Secure
Saturday, February 18, 2006
 
 
Teh problem with the giant database is its complexity. Complexity often increases with the square of the zize.

For a small to medium sized business it is often worth asking if it is worth consolidating all the different data sources they have. The end result can often be expensive and frighteningly cumbersome. When a company finds its policy is dicated by how easy it will be to fit in with its IT system tnen it is having problems.

After all, is there really a reason for the Personnel and the Customer data to reside on the same database?
Stephen Jones Send private email
Sunday, February 19, 2006
 
 
Seems to be alot of different definitions of database here.

First of all, even though it's not in a standard relational db like Oracle, SQL, DB2, whatever, doesn't mean it's a) not a database and b) not relational. SAS datasets are a perfect example of relational databases stored as files on file shares.

Second, this discussion is definitely colored by the tools people use. For example, on SQL, a "database" is basically equivalent to an Oracle schema. In the clinical trial world, an Oracle Schema is a database. Used to drive me crazy, as normally in Oracle you describe an instance as a database.

Arbitrary rules such as more than 20 tables is silly. The range of human activity cannot be described so simply.
Steve Hirsch Send private email
Monday, February 20, 2006
 
 
+1 for Sgt. Sausage and the guy who talked about why so many tables using the SAP example.

I write Accounting/ERP software and I have multiple schemas:

Documents Schema (all my POs, Sales orders, quotes, bids, etc...)

App Schema -- This is business logic and form layouts.  Each screen poped up on the screen can be changed/customized per install base.

Transaction Schema -- We nearly duplicate the documents schema to do things like revision control on a PO.  When a PO is changed, we rev it and store the revs here.  These tables are typically purged over time.

EDI Schema -- A schema that deals with EDI features.

*** Schema -- Industry specific modules we build usually get their own schema.

Like Mr. Sausage stated: One big DB is fine.  If you think you need multiple dbs, then you should consider table spaces which allow you to physically separate where data is stored.

Explaining why one needs many tables is neat to talk about but otherwise I think this thread is pointless...
Eric (another ISV guy with his company)
Thursday, February 23, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz