The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Multi-tenant database design

We are in the process of designing an on-demand business application. We use MS SQL Server for the database.

Here are some of the constraints we've been given:
* Single database for multiple customers
* Customers can extend the data model by adding fields and tables (application is meta-data driven)

What would be the best way to solve this?

1. Have a TenantID field in all tables

The database remains static, i.e. adding a tenant does not change/add tables.

Extending the data model would not really change the physical model, but make use of extension tables. These extension tables contain the extra fields.

Advantages:
* Static data model
* App can use DB account with limited rights, always a good thing for security

Disadvantages:
* Complicated mapping between logical model the customer sees and the physical model, resulting in queries requiring joins to extension tables...
* Difficult to optimize the database, e.g. adding proper indexes on customer extensions

2. Have each table exist per tenant

Every table would exist once per tenant, e.g. the Account table would exist as Tenant1ID_Account and Tenant2ID_Account.

Customizations would really alter the data model.

Advantages:
* Simplifies queries and reduces the complexity of mapping
* Easier to optimize the database by adding indexes on some of the extensions

Disadvantages:
* Database account used by the application requires dbo rights, which may be dangerous if somebody finds a security hole letting them do a SQL injection attack
* Many tables. The application we are designing is a large suite of modules. This can easily add up to 500-1000 tables per tenant. A moderate success would result easily in a hundred tenants. I don't think managing a database with 100,000 tables is realistic

-----
Unfortunately the option of using a database per tenant is not allowed. (Don't ask me why, I did not make that decision, and I've already had many arguments about it)

Does anyone reading this have experience with designing such an application? All hints and tips are welcome, as are links to articles about multi-tenancy. I've tried googling with little success.
Mr.OnDemand
Saturday, March 11, 2006
 
 
Does MS SQL Server have a similar concept to Oracle schemas? If so, that would seem ideal.
LC
Saturday, March 11, 2006
 
 
Sounds like your caught in a difficult spot, but you've laid out some interesting points. I would add the following:

"1. Have a TenantID field in all tables"
Any system enhancements along the way will benefit all customers without running against all the separate and possibly different schemas in 2. But the requirements are a recipe for disaster because they want you to create a complex modifiable system from scratch (it goes against Gall's law).

"2. Have each table exist per tenant"
Allows better service to the specific customer without affecting others. This is actually basically the same as your separate database idea and it will be easier to break off a specific high load customer into their own database at some point (despite what your bosses say now). Maintaining it could be too much, like managing a whole bunch of separate applications that branched from each other.

Ultimately I would go with number 1, but only if I could get concessions on the extensibility. Really you should initially just be concerned with building a good and simple application that solves a problem, and not with "building the machine that will build the machine".
Ben Bryant
Saturday, March 11, 2006
 
 
I have been involved in five different projects in which each user, or survey, or client, or whatever got their own set of tables. In each case this has been disasterous. In each case, the project was later re-engineered to use a generic, static schema.

An RDBMS like MySQL/Oracle/MSSQL is not meant to be constantly spawning new objects. The management tools, best practices, and DBAs have all, for the most part, expect fixed schemas. That's how they know how to do their job best.

The generic schema that supports unlimited customers and provides extensions and customizations for each customer may be a little more work but it is the only way to do it. I think the reason you haven't found much on Google is because your questions was settled many years ago with the introduction of the relational model.

If you want to have separate objects for each customer, then look at a hierarchiel database database such as Cache or IMS, or possibly try a hybrid SQL-Object DB such as Matisse.

http://www.intersystems.com/
http://www.ibm.com/software/data/ims/
http://www.matisse.com/

If you want to stick with the relational model then I would strongly encourage the first approach that you outlined.
PWills Send private email
Saturday, March 11, 2006
 
 
If by some reasons you are not allowed to use different "schema" for different users (basically, schema is a kind of namespace), then the best design is to create an additional field for each table which store "owner id" for each tuple in the database.
Then you have to define a set of UPDATABLE views, each view for each table, mapping is 1-1 such that view select only rows which belongs to the current user. View must select all columns besided owner id
Since views are 1 to 1, then they can defined as updatable and an user can do insers, updates, deletes as they would  "tables", so user does not see any difference between view and table.
1 Such solution make a virtual database for each user
2 it is very easy to implement, simple perl script may generate necessary DDL to update your schema (as I did in my projects)
3 it does not create any performance problems, since views  are precompilied by database engine
Andrei Lopatenko Send private email
Saturday, March 11, 2006
 
 
We went with the database-per-tennant route at my last job for a number of reasons:
1. Keeps data separate for security reasons
2. When you acquire a new tennant, you use your DB creation script and just change the name of the DB files.
3. When a tennant departs, you just dump their DB and delete the files... done!
4. Allows you to load-balance across separate DB servers when the business grows.
5. A DB Server failure doesn't affect all your tennants -- just the ones on that server.
6. Reduced query complexity -- otherwise every single SELECT / UPDATE / DELETE sql statement needs a "AND TenanntId = @TennantId" where-criteria.

We handled customization by designating certain table columns as "core" -- not subject to customization, and they were in their own table.  Tennant custom fields would be in their own table, the structure of which would be described by another core table.
example Send private email
Sunday, March 12, 2006
 
 
>or whatever got their own set of tables. In each case this has been disasterous.

I have to agree. The only time I seen user fields work is in contact managers, and in those cases, about 10, or 15 fields are reserved for custom data. NEVER is a new field added, and never is a database schema changed. This approach just not work. How do reports, data entry screens, and even code going to deal with new fields?

To be fair, the advent of HTML, and more “free” data structures such as XML does open up some possible ideas here, but, I never seen systems that allow data schema changes work.

For the most part, the temptation is to allow a customer to add new fields in the place of a poor data model. 

I have a drop in invoicing system that I use for most of my applications (you folks do have re-useable invoicing system that you use for each new application..right?). Anyway, often there is special things, like a environment tax, a gratuity, local taxes, special fixed costs (fixed hotel charges for example) etc. etc. etc.  that need to be added to this invoice for a particular locale. Most invoice designs are hard coded, and you see that extra “tax” box at the bottom. With my design, I can add any fixed cost, any tax cost, and even things like early bird discounts. I done this with a good data model. I NEVER have to add a new field to my data model when building a invoice. This gives me an ability to deliver software into different markets, and also allows the customer to add special features to the invoice for their particular industry.

The problem with systems that allow fields to be added is that the argument goes that the problem can’t be defined ahead of time. Well, I hate to bust ones bubble, but if you can’t define the problem, then you can’t write the software either….

I suppose for systems where  you have trained people, and they are going to use sql, or build their own reports, then allowing schema changes could work. However, even complex accounting systems that have to job cost different types of products based on labor, or materials cost, or weight cost seem to have a data model that solves this problem.  These systems allow  you to define a new type of cost, based on time, length, weight etc…whatever you come up with. Once again, no schema modifying is needed to add a new type of cost to the system.

Setting  aside a number of fixed fields for the customer is one thing, but schema changes? Hum, nothing wins over a good data model.

I am open to applications that allow schema changes, but I am skeptical that this works well in practice.

Albert D. Kallal
Edmonton, Alberta Canada
Kallal@ msn.com
http://www.members.shaw.ca/AlbertKallal
Albert D. Kallal Send private email
Sunday, March 12, 2006
 
 
We are facing the same problem in a hosted application. In our situation, we need to support thousands of customers. Does anyone have experience with using separate databases for each customer in that type of volume?

I like the idea of using updatable views, but the point about it doesn't impact performance seems wrong. If the view is pre-compiled, it only save the compilation time, you still need to query against a table with all customer information in it instead of one, would that be a big performance hit?
blablabla
Monday, March 13, 2006
 
 
I never had experience with thousand customers.
Only with scores of them, up to 100. My experience says that a solution based on updatable views works perfectly up to 100 customers.
I am quite sure that updateble view solution will be the same efficient for 1000 customers as a plain access to the database
Since views are precompilied then all sql operations are very cheap. Since selection operation is hidden inside of the view, then each developer or sql user which develops an application or just selects data does not need to do any additional work to select relevant data, h/she just use view name iinstead of table name
An updateble view solution does not create any administration overhead in comparison to separate "schema per each user" solution.` If you have 1) audit and  security, 2) backup and restore policies then you have to manager them for each schema....
If you use a "schema per each user" solution, then additional disk space and main memory (*1000!) will be required since each schema and each database object  need initialization
Andrei Lopatenko Send private email
Monday, March 13, 2006
 
 
Mr. OnDemand,

After revisiting your original requirements, I still think you're making too big a deal out of this. If your customers need to add custom fields, a little denormalization will go a long way.

CREATE TABLE [customer_attributes] (
 [id] INT NOT NULL,
 [customer_id] INT NOT NULL,
 [name] NVARCHAR(100) NOT NULL,
 [data_type_id] INT NOT NULL
)

CREATE TABLE [customer_attribute_values] (
 [id] INT NOT NULL,
 [customer_attribute_id] INT NOT NULL,
 [value_int] INT NULL,
 [value_nvarchar] NVARCHAR(1000) NULL,
 [value_text] TEXT NULL,
 [value_float] FLOAT NULL
)

This is the most simplistic example of extensibility. Each customer can add new records to the [custom_attributes] table. Each record in this table indicates the data_type of the record. Then, in the value table, your application uses the [data_type_id] to determine which column holds the data. (Alternatively, you could serialize all data as strings, which would make the application logic simpler but might have a performance hit.)

You can extend this sort of schema to do almost anything you want. And at no point will you need to spawn new DB objects (sprocs, views, tables) of any kind.

The key is to decouple your actual database schema from what I'll call "the user's perception of your schema". In the above example, you could make the two tables completely transparent to the user. As far as the user is concerned, they are just adding columns to the [customer] table.

I would be happy over e-mail to help with some real-world examples of these patterns.
PWills Send private email
Monday, March 13, 2006
 
 
I like the way SAP Business One handles the whole custom field thing. They have a utility that allows you to add fields to almost any of the tables associated with main UI objects. There is a special naming scheme to keep track of who owns what and there is also a set of tables that keeps track of what has been added where. Either or both can be used to handle 'official' schema upgrades without affecting the User-defined fields. And yes, the fields are physically added to the tables, not to 'extension' tables.

You can open any relevant UI object and show/hide user-defs. They display in a separate window for which there is some programmatic means of keeping it 'attached' to the main window.

There is also a facility for creating user-defined tables.

Obviously, the user-defs are not subject to any standard processing, but they've also implemented a UI-based 'trigger' mechanism that can automate the transfer of user-defs as the data moves from (for example) Purchase Order to Invoice.

There are limits placed on the data types and field lengths, indexes and relations, etc. so that you don't need to reinvent a DBMS. All in all, I'd have to say that it's pretty slick. It also strikes me as the right way to deal with user-def: go ahead, add fields and we'll display them, but don't expect us to process them.
Ron Porter Send private email
Monday, March 13, 2006
 
 
>> In our situation, we need to support thousands of customers. Does anyone have experience with using separate databases for each customer in that type of volume? <<

We had 250 (large) customers or so, so nope.

I think in something like your app which appears to be much more mass-market than ours was, schema consistency is going to be critical.  And so you should ignore my earlier suggestion about splitting tables into core and custom.  Instead, just designate 20 fields or so for custom data (or as many as you can stand) as Albert suggested above.  In the short-term, you'll have some pain from the customers who want more, but in the long-run, across the entire customer-base, you'll be better off.

The reason is migrations.  When a new version comes out, you want to be able to run an upgrade program or script to move all your customers to the new schema at the same time (or maybe staggered).  If the schema is too out of control, that becomes very difficult.
example Send private email
Monday, March 13, 2006
 
 
Lesson learned: most of you seem to be saying it is best to keep the schema static. The extension method shown by PWills is what I called extension tables in my first scenario.

Thank you very much for all the feedback.
Mr.OnDemand
Monday, March 13, 2006
 
 
Examine Windows Sharepoint Services db schema. It has to handle custom lists and what not. Ive used a similar design successfully myself
... Send private email
Monday, March 13, 2006
 
 
In particular, you want to view the UserData table. It has extension fields as Albert described. It has, if I recall correctly, 20 fields for each of the fundamental MS SQL types (ntext, nvarchar, int, etc). It's a very 'wide' table but it gets the job done.
PWills Send private email
Monday, March 13, 2006
 
 
Yep, it's reasonably quick also and you can always put views on it for regularly looked at data sections
... Send private email
Tuesday, March 14, 2006
 
 
The clinical trials area of the pharma industry has exactly this same problem. Each trial is its own application, there can easily be 250 trials, each trial has about 30 or so distinct heterogeneous (but similar) structures. It works because it has to work, because that is what the business is about.

To top it off, the trials can change their structures midway.

If the application requires flexible metadata, then don't force the business to conform to static metadata for your convenience.

Here are some rules of thumb:

1) Very very very tight metadata repository.

2) Strongly encourage users not to think outside the box. Look inside the box first.

3) Your development tool _must_ have dynamic capabilities.

4) Try to use Entity-Attribute-Value structured tables. This way, you can make DDL changes without DDL statements. Oracle Clinical uses this to great effect.
Steve Hirsch Send private email
Thursday, March 16, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz