The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database Design Trade-Offs & Object-Relational Mapping


I'm creating a web app that is used to to create and serve forms. The customers will build a form with their choice of different types of fields (radio buttons, aphanumeric text fields, number fields, and so on). Most of these forms will be things like surveys. The design should be flexible enough to accommodate new types of fields easily.

There will be at least 100 different forms in active use on the system at any one time, and the number of people entering form data will vary from less than 10 in a month to 5000+ in two hours. The database will be Microsoft SQL Server.

The customers will want to do basic analysis and querying on the form data, and will be able to download it in a format suitable for further analysis.


What is the optimal database design for handling the form data? I see two alternatives:

1) One table per form
Create a new table each time a form is created. Each row represents one user's form entry (i.e. one filled in form per row).

2) One table per field-type
Each type of field has a table, which is used to store data for all forms. Create a view or something eqivilent for each form to operate on the data.

I'm wondering which of the two would be easier to maintain in the long term and has the least performance trade-offs? I think it might be the first option, but I don't have enough experience with databases to properly assess the trade-offs.

Thoughts, anyone?
Rob Send private email
Tuesday, July 24, 2007
Drupal's CCK doesn't this in an interesting way: and They have a scheme for sharing common field types and building new ones that may work for form stuff as well.
son of parnas
Tuesday, July 24, 2007
Well I certainly wouldn't use the second option whatever I did. 

I might use the first idea but only after experimenting with more dynamic solutions that don't involve changing and creating sql tables on the fly.

For example you might considerI would suggest looking into making each form create xml data and using some sort of xml querying tool.
Bill Hamaker Send private email
Tuesday, July 24, 2007
Both options suck rocks. go for option 3:

A table with one row per form (T_FORM)

Another table with one row per field in a form (T_FIELD)

Yet another table with one row per field type (T_FIELD_TYPE)

When a form is created, one entry is made to T_FORM to represent this form, including the name of the form and who owns it. For each field in the form, one entry is made to T_FIELD which include the name of the field, a reference into the T_FIELD_TYPE table for the field type and any constraints on what the field can contain.

You may need other tables to represent workflow, users, filled-in form data, etc.
Jeffrey Dutky Send private email
Tuesday, July 24, 2007
The problem with option 3 is that you lose the the ability to store the field data as the appropriate type (numeric, date, etc) - you'd have to store all the data as a string/VARCHAR.

It may be worth the trade-off to do that, but so far I'm not convinced (although I am quite happy to admit the solutions I offered have some big drawbacks of their own).
Rob Send private email
Tuesday, July 24, 2007
> you'd have to store all the data as a string/VARCHAR.

That's quite common. It's better than a union or dealing with all the different types. One problem is depending on your database, varchars can waste a lot of space. It may not matter for you though.
son of parnas
Tuesday, July 24, 2007
"> you'd have to store all the data as a string/VARCHAR."

As son of parnas pointed out, this is actually quite common. Can you say XML? XML stores all data values as strings and then converts as necessary based on the schema of the document.

The Option 3 given above is your best bet.
dippin' dots
Tuesday, July 24, 2007
plain text or XML is your best bet.

You can add regular integrity checks by field type to ensure the data is not getting messed up through a bug or whatever.

Too bad SQL doesn't have a variant (or a C union struct type) type - it would be useful for this scenario (and probably abused regularly...).  I imagine you could use blob, but I'd prefer to just see it in text or XML.
Cade Roux Send private email
Tuesday, July 24, 2007
I think that you're looking at this the wrong way. First design your table structures, then design your forms around them. It's kind of a cart before the horse thing you've got there, an application's heart is its data model.
Steve Hirsch Send private email
Tuesday, July 24, 2007
I did this same kind of project.
I went really simple.
It was an application form - about 12 pages (12 forms)
Just went a table per form, that had all the columns for the form + a table for special cases. All the columns were nullable since they wanted to be able to partially save a form, so all the mandatory validation was done by the BLL

BLL was simple ASP.NET, with an aspx for each form using datasets and dataadapters to update the DB. Each aspx had its own js file (with a common one for common functions)

This was surprisingly maintainable and very scalable. Basicaslly you could just keep adding servers on the web side and scaling an enormous amount.

a very simple solution, but a solid one it was.
... Send private email
Tuesday, July 24, 2007
Oh and since this was round based, after the round closed we pulled the data into a proper normalised structure with proper constraints. Since this data was subsequently edited, it gave us a nice historical view of what the original application form said.
... Send private email
Tuesday, July 24, 2007
Oh, and never use XML languages. Run like the wind...
Steve Hirsch Send private email
Tuesday, July 24, 2007
Actually, I didn't address the tables needed to store the user-entered data. There is a straightforward solution to that the allows you to store all your data in correct native formats (but it's a bitch to select from):

One table to represent the user entered for as a whole (T_USER_FORM), one row per user entered form.

Some number of tables, one for each native data type, with one row per corresponding field in the corresponding form (T_USER_DATA_INT, T_USER_DATA_STR, T_USER_DATA_NUM, T_USER_DATA_DATE, T_USER_DATA_TIME, ...). Each row in each of the user data tables would reference back to both a T_USER_FORM entry and to a T_FIELD entry.

So, lets see an example of a full database with three data types (number, string and date), two types of forms and three forms that have been filled in by users:

    -- ----------------------

    ID NAME      REGEX
    -- --------- ----------
    1  NUMBER    [0-9]+(\.[0-9]+)?
    2  STRING    .+
    3  DATE      [12]?[0-9]/[123]?[0-9]/[0-9][0-9]

    -- ---- --- ---- -------------
    1  2    1  1    FIRST NAME
    2  2    2  1    LAST NAME
    3  3    3  1    HIRE DATE
    4  1    4  1    SALARY
    5  2    1  2    BANK NAME
    6  2    2  2    ACCOUNT NUMBER
    7  1    3  2    PERCENT PAID

    -- ---- ----
    1  1    123
    2  1    45
    3  2    123
    4  2    45
    5  2    123

    ID USER_FORM FIELD VALUE (a numeric value)
    -- --------- ----- ---------------------
    1  1        4    55000.00
    2  2        4    75000.00

    ID USER_FORM FIELD VALUE (a varchar)
    -- --------- ----- ---------------------
    1  1        1    JOHN
    2  1        2    DOE
    3  2        1    JANE
    4  2        2    ROE
    5  3        1    1ST BANK OF BOSTON
    6  3        2    12345-67
    7  4        1    STATE CREDIT UNION
    8  4        2    89012-3-4
    9  5        1    BANK AMERICA
    10 5        2    0000123-45678-90

    ID USER_FORM FIELD VALUE (a date value)
    -- --------- ----- ---------------------
    1  1        3    1/3/2005
    2  2        3    5/22/2002

Now, as I said, I don't think that this is a very good design, but it meets the "requirement" stated by the OP that entered data be in native format. I also wouldn't want to be the poor schlub who has to write the SQL to extract the entered form data (but, of course, I'm not that much or an SQL hotshot, maybe someone else, with more knowledge of SQL, can see any easy way to do the extracts without going completely insane). Things would be so much easier if we only had a single T_USER_DATA table where everything is encoded as a string.
Jeffrey Dutky Send private email
Tuesday, July 24, 2007
Store the form as Xml in a blob field (or xml type if your RDBMS supports that) and use XPath to lift out certain values from the form xml and put those in seperate columns on the same row.

If you need to explicitly support a hierarchy in the forms (repeating items) declare XPaths that point to child-sections and store each section seperatly but under the same form-id.

Dynamically generate a new form specific database table for each type of form designed (make sure you can load balance this over multiple databases). The form designer should allow a user to make a field searchable (generating the XPath in the background).

If your RDBMS supports a native xml data type, you can even consider to throw in extra search options on non-promoted fields and go in directly into the xml fragment (but performance will be sub-optimal ;-).

Something along these lines...
Marc Jacobi Send private email
Wednesday, July 25, 2007
Thanks for all the great ideas everyone. I'm now convinced it's not worth the additional complexity to try to store the data as its native type.

Incidentally, I saw a similar application where their approach was to create a new database for each form (this is for a system handling thousands of such forms)...!
Rob Send private email
Wednesday, July 25, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz