The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

The Mother of All Unions

There was recently a topic on documenting the data in a huge overgrown database.  Here's a  topic that outlines one very practical method, even if it's a dirty one,  for gaining an understanding of what some of the data means.  If you do this sort of thing, you will be wise not to admit it.  The relational puritans have a congregation for the defense of the faith whose purpose is to root out heresy.  You don't want to tangle with them. I hasten to add that I myself am quite given to using the relational model.  I'm just not ashamed to twist it under special circumstances.

A few years back, I was dealing with one of those databases that has about 500 tables in it.  Some end users had an exasperating habit of referring to a code,  like "Q42" with no clue as to what the context of the code was,  and no way to find out where in the database to look.
The following sample dialog illustrates the point:

A:  We never should have sent this guy a letter.  He's a Q42.
B:  What the heck is a Q42?
A:  I have no idea. But we never send them any letters.

So I built myself a tool that would allow me to look up "Q42" without knowing which reference table it was defined in.  I call it "the mother of all unions" (MOAU).  It consisted of about 180 select statements, joined by union where each select looked sort of like this:

[code]
select
      country_code as code,
      country_name as name,
      'COUNTRIES' as source
from countries
[/code]

Note that the source, the name of a table is plugged into the query as a constant,  and the output of this query is a blend of metadata and data. I'm not going to defend this practice.  It just turned out to be useful.  The 180 select statements referenced 180 different reference tables,  where each reference table had a primary key that looked like "mumble_code" or "mumble_id" or "mumble_no"  and another column that looked like "mumble_name" or "mumble_desc".  Forming the union can take a while, even on a fast server.

I discovered the 180 tables using a few judicious queries into the system tables  (what Oracle calls "the data dictionary").  I used semi automatic methods to construct the MOAU out of the system table entries.  These semi automatic methods involved SQL, a scripting language, and a high powered text editor.

If I had had the right to create views in this database, I would have done so.  As it was, I settled for havig a few SQL scripts in my own files that played the role of a view.  One of them would look up a given code,  like "Q42"  in all 180 tables,  and tell me not only what the code meant,  but what table it was defined in.  From there, I could trace back to what transaction tables the code was used in,  and what process was in error by lettting a Q42 through the cracks.

I also created a query called "key word in context" (KWIC),  which would find every occurrence of a given keyword (like "Colombia" or "Golf") in the name or description columns.  This turned out to be quite useful as well.

This is by no means a comprehensive approach to documenting a database.  It's just a handy little tool,  among dozens of others, that you can use to expedite the process of "data archaeology".  It you hang around production databases long enough, you will end up doing some archaelogy.
Walter Mitty Send private email
Monday, September 01, 2008
 
 
I like this MOAU, if you had the rights, a materialized view would be perfect, and much less resource-intensive.
Totally Agreeing
Monday, September 01, 2008
 
 
I do this all the time. I call it data archaeology, or data CSI.
Steve Hirsch Send private email
Monday, September 01, 2008
 
 
I've been waiting for someone to use this to justify using a single table for all lookup values, but have so far been disappointed.
David Aldridge Send private email
Tuesday, September 02, 2008
 
 
There is a reason why I called this a view, and not a table.

Using separate tables to maintain fundamental distinctions is a good thing.  Using views to obscure distinctions when desirable for certain purposes is OK. 

OTLT obscures distinctions that are clear in the subject matter.  It doesn't result in the performance gains that its loyalists assert for it.  It doesn't result in conceptual clarity.  It doesn't result in easier maintenance, even though it appears to for the first few years.

MOAU isn't OTLT.  Not at all.
Walter Mitty Send private email
Tuesday, September 02, 2008
 
 
"If you do this sort of thing, you will be wise not to admit it.  The relational puritans have a congregation for the defense of the faith whose purpose is to root out heresy.  You don't want to tangle with them. I hasten to add that I myself am quite given to using the relational model.  I'm just not ashamed to twist it under special circumstances."

The existence of system tables is demanded by the relational model so that they can be queried like any other data. There is nothing twisted about this approach.
Rich
Tuesday, September 02, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz