The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

One Database or Multiple Databases...

At my company we have one major application that everyone uses.  Our entire business process is centered around this application.  We have many apps that integrate with this application.  Some of these apps have simple storage requirements (ex, a single table in the db), others more complex (ex, multiple tables, views, sprocs, full text catalogs etc).

In the past there has not been much thought given to separating out the more complex apps into their own databases.  We have some applications with 10+ tables that are cluttering up our main database.  I want to change that now.  I want to set some rules down that govern when we should use the main application's database for a new application's storage and when we should create a new database (Keep in mind that all new applications considered will be interfacing with the main application.  Any non-related applications get their out database be default.). 

Here are some of the rules I'm considering...

For 1-3 tables*, storage continues to use our main application's database.

When storage requirements exceed 3 tables*, a new database is probably warranted.

When extraneous storage requirements (full text searching, special backup requirements, sprocs, views, udfs) come into play a new database is probably warranted.

*  I picked 3 tables as the cutoff because this is usually the limit of a simple master-details type of application.

Does anyone here have any guidance?  These are all in-house applications.
Wednesday, December 20, 2006
Do you mean multiple databases as in a single MySQL (or whatever) installation running on a single server but divided into many databases; or as in multiple MySQL (or whatever) installations, potentially running on multiple servers?

Our app is divided into two DB's, but our DB vendor considers a separate DB to be a whole separate set of OS files, processes, etc., so it makes a BIG difference to how everything it setup. We do it because one of the DB's contains data unique to each customer and one contains data shared amongst all of our customers. But I think the typical MSSQL, MySQL etc. setup is different.
Greg Send private email
Wednesday, December 20, 2006
I mean multiple databases in a single server instance (MSSQL in my case).
Wednesday, December 20, 2006
I believe you should make your decisions along functional lines rather than technical issues such as the number of tables or the size of those tables. By functional, I mean you should think about what they do as opposed to the department that owns the data.

For example, privacy related information should be in one database, financial records in a second, and inventory in a third. Your main application acts as a gateway or as a portal to each of these databases and to the user, it may appear as one database. However, behind the scenes, I'd be very surprised if management didn't prioritize child applications, that is, the financial stuff had certain safeguards.

So as far as future "tables" go, the criteria I'd use is, am I introducing new functionality, and does this functionality have performance, archival and/or security requirements that prevent me from merging it with similar, already existing functionality?
Wednesday, December 20, 2006
IANADBA, but I'd get rid of any redundant data (including data derived from other data already in the database) and consolidate any tables where it makes sense (such as moving the full text catalog columns into the main product table) and see where that takes you.
Wednesday, December 20, 2006
Under ideal circumstances, I'd agree with dev1.

However, keep in mind that restructuring legacy tables usually comes with a significant up-front cost. Since you have so many intertwined applications, you're probably better off introducing new functionality and obseleting tables as you go along, as opposed to "rewriting" the entire system all at once.
Wednesday, December 20, 2006
You don't give any reason to consider this other than perceived complexity. Everything you describe about the requirements suggests you should keep it all in the same database. Introducing the need to access multiple databases seems like the wrong move to solve perceived complexity. Basing the decision on number of tables sounds completely arbitrary. If you have new data schemas that have a completely separate purpose then by all means create a separate database.
Wednesday, December 20, 2006
I can tell you from painful experience that it is -usually- best to have each application use it's own separate database. The pain cycle often starts with "just one" new app's tables in the schema, then it's two, then ten and pretty soon you have something that's unmanageable.

Is more than one developer writing the applications that store data in the main db? Even if you have development and database standards in place, often they are not followed or enforced. You'll end up with many different table/view/sp naming conventions, different levels of normalization, missing referential integrity, OLTP data locking, etc.

Having each application manage it's own storage requirements allows best (or optimized) practices to be used in whatever environment the app is designed in. Maybe an intranet interface into the data needs only temporary storage and can use session state on the web server. A reporting app might need to aggregate a lot of data and could retrieve the base information and then churn through it in another, potentially more efficient relational engine. A web service might not need any storage at all and might only act as a data gateway for other apps that do use their own storage.

By making the secondary applications as autonomous as possible (within reason) you open up opportunities to maintain, improve and even decommision in a much more maintainable manner.
MikeS Send private email
Wednesday, December 20, 2006

Assuming you are using a real database, take a look at whatever their equivalent of PostgreSQL schema is.  They are basically namespaces your database, letting you partition your data.  At least in Postgres land, you can hide schemas or  change the order of "layering" based on the user.  Take a look, this is exactly what you want!

If you are not using a real database yet, well, here is your motivation.
Cory R. King
Wednesday, December 20, 2006
he already said he was using MSSQL. good advice though, for someone who didn't read the thread.
no, seriously
Wednesday, December 20, 2006
I actually do think my comment was VERY useful.  I don't know enough about Microsoft's SQL server, but if it has a similar concept, than he is golden.

/ot: MSSQL is a pretty poor product name, ya know.  Doesn't stand out much in a chunk of text.
Cory R. King
Thursday, December 21, 2006
I agree partitioning is useful.

I'm a bit concerned that it may be overkill for this specific suite of applications and I'm basing that judgement strictly on the fact that the original poster believes three tables is a good cutoff point.

If there are, hypothetically speaking, 90 tables in the system and 30 make up the main application, you're essentially asking someone to come in and create 20 or so three-table partitions.

I stand behind Cory's suggestion if each "mini application" had in the ballpark of 100 tables or more. At that point, there are potentially some performance benefits to be gained from consolidating databases and using partitions to "fake" the separations.
Thursday, December 21, 2006
The rules described for database partitioning seem a bit peculiar. Below are some of the rules that I follow:

1) each applications generally gets their own databases - btw, I hardly come across apps with fewer than 50 tables these days...  it is *okay* to have as many tables as you need

2) some apps need multiple databases; when this happens it is usually partitioned along the line of data usage... for example, transaction data gets its own db, and data warehouse gets its own db; and once a while, you need a staging database to move data in/out

3) if you have serious data needs, then you can look at either horizontal or vertical partitioning for performance needs.  But if you get here, you will know.
Yin-So Chen Send private email
Sunday, December 24, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz