The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Data Modeling Suggestions: Survey Tool

I'm thinking about the data model for a survey tool and was hoping to get suggestions from others who may have done this before.  The idea is to let an author assemble a set of questions, present it to users, and collect responses.  One area where I'm unsure so far is related to handling multiple question types.  A question could be a boolean, enumerated value, text field, or memo type.  Right now I'm unclear as to  how to best store user response to each question assuming the data type of the question can vary.

Here's the initial schema I've considered (excuse my poor notation):

survey ->* question (survey_id,prompt,data_type..)
survey_reponse (survey_id, response_id, question_id, value)

survey_response.value might could be a text field, which I could coerce to the appropriate type at load/save time within my object model.  The only caveat to this would be storing memo fields, so rather than continually load this information, perhaps a memo_id could be added to survey_response to avoid loading unless necessary.

The other option (maybe?) would be to add a wider schema for tracking responses according to the various supported data types.  Something like answer_(data|boolean|text|numeric) etc.  This seems wrong in the sense that the row is only ever populated with a single value. 

I'll stop there, but I would welcome suggestions or experiences from before.

Tuesday, October 23, 2007
You might require a branching structure. If you have questions that apply to one group of respondents (male/female), or if you have a series that follows a boolean option:

1. Are you a programmer?

  1a. Do you consider yourself a 'rockstar programmer'?
  1b. Can you make a website for my brother-in-law?
  1c. I am having a problem with my PC, would you mind coming by to take a look?
Jason T
Tuesday, October 23, 2007
I forgot to add that one downside to storing an answer in a "generic" field and coercing to the appropriate type is the storage implications.  Storing a bit for yes/no response in a text field seems pretty wasteful.  I suppose this might be optimizing early, however; the response/answer table is going to grow much faster than the survey/questions tables.
Tuesday, October 23, 2007
This is an one question where I know a lot about the subject matter as I work for a company that does market research software.

Basically, you need to consider a bunch of market research issues that you software may need to consider - for instance multiple-response questions ("which of the following websites have you visited in the last week"), loops ("for each website you visited in question X, please state...") and multiple levels  of counting "for each respondent, for each credit card they own, for each transaction on the credit card state...".

Add to that a bunch of other issues like null responses always being valid (and different from zero qualities), qualifying questions etc, and you see that MR software typically uses quite different storage types. Remember you can get very sparse data as well - e.g. state which car you own (from a list of 10,000 plus cars) can be stored as 2 bytes for single response axes, but upto 10,000 bits per respondent if you have the question as multiple-response capable.

Check out the specification for the semi-open Triple-S XML format as  one off-the-shelf format. SPSS (the market leaders in this  field) also have lots of documentation as their DatamModel format tries to store everything .. though I have found it doesn't quite handle everything.
Grant Black Send private email
Monday, November 12, 2007
oh, and don't forget about performance.

It kills a lot of newbies to the industry who all start throwing things into an RDB table with respondents as rows and questions as columns. Seems OK & works for small cases.

You get 1000 respondents per survey per country and after a while you end up trying to build a 7 dimensional cube with 250,000+ respondents.. and failing. :-(
Grant Black Send private email
Monday, November 12, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz