The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database artifical keys

I want to store XHTML in a SQL database, so ...

TABLE TagsAndText
Tag Text
'h1' 'This is a heading'
'p' 'This is a paragraph'
'h2' 'This is a subheading'

(other columns ommited for brevity).

The Tag values aren't unique/primary in the table, but are used as a key (e.g. there will be queries like "select Text from TagsAndText where Tag='h1'").

There's a finite list of permitted Tag values, which can be implemented by a foreign key constraint:

TABLE PermittedTags
Tag
'h1'
'h2'
'p'

My question is how/whether/when you would decide to introduce an artificial key, e.g. like this:

TABLE TagsAndText
Tag Text
1 'This is a heading'
3 'This is a paragraph'
2 'This is a subheading'

TABLE PermittedTags
TagId Tag
1 'h1'
2 'h2'
3 'p'

It seems to me that the tag values like 'h1' are a 'natural' key.

Would introducing an artificial key only be to improve performance? If so, would you do it immediately (because it's better to be proactive about foreseeable changes), or delay until proven necessary (because 'premature optimization is the root of all evil')?
Christopher Wells Send private email
Monday, July 17, 2006
 
 
Since the natural keys appear to be immutable, I would not make an artificial key.

However, it's not going to be unique in your TagsAndText table, is it?  (You really don't want your table named that, do you?)
Cade Roux Send private email
Monday, July 17, 2006
 
 
> Since the natural keys appear to be immutable, I would not make an artificial key.

Okay ... thanks for your answer.

> However, it's not going to be unique in your TagsAndText table, is it?

Tag values aren't unique in the table, for example there can be several rows in the table with an 'h1' tag.

> (You really don't want your table named that, do you?)

Is there a convention for naming tables?

In fact the table is currently called 'Nodes', because the rows (document elements) have a tree-like hierarchical structure (defined by the columns I ommited).
Christopher Wells Send private email
Monday, July 17, 2006
 
 
Every table needs a primary key, even if your RDBMS doesn't enforce that requirement. The primary key should uniquely identify each row. Then make sure that the functional dependencies of the other data in each row are such that the the table is in 3NF, at least.

If you can't identify a primary key (more precisely, a 'candidate key') then you aren't even in 1NF, so you haven't got a relational database.

Try to find a natural key, either single column or composite. If you can identify a NATURAL key, whether a single column or composite, it signifies that your data schema is probably more correcter.

But natural keys are often a little tricky. If it's composite is it a mixture of text and numbers, for instance. The primary key of a table is almost certainly going to be used as the foreign key in another table. So if it's a 4 column key you've just added 4 columns to the other table.

Contrary to popular opinion primary keys (by which I mean the values of them) don't HAVE to be fixed for all time. Most RDBMSs will cascade updates to the primary key to any child tables that use them as foreign keys. But it is better if primary keys are at least 'stable' as opposed to 'static'.

So I approach it this way. Identify a natural key, if possible. Make it not allowed null and put a unique index on it. Then using whatever autoincrement type generator the RDBMS has create a separate column and use that as the primary key. Some people call this a 'surrogate' key. And usually I'll never show that primary key to the users.

Looking at your tables:

TABLE TagsAndText
Tag Text
1 'This is a heading'
3 'This is a paragraph'
2 'This is a subheading'

I can't see the primary key here, but then I don't really understand the usage. Is the PK going to be the 2 columns combined.

TABLE PermittedTags
TagId Tag
1 'h1'
2 'h2'
3 'p'

That looks fine. The first column is acting as a surrogate key. Make that the 'official' PK, and put a unique no nulls constraint on the other one. This supports my 'might change sometime' theory. Aren't tags usually capitalised - 'H1'? And, not that I know the usage, I'm willing to bet that one day you'll want to create a column with the closing tags. Maybe
Database purist
Tuesday, July 18, 2006
 
 
> Every table needs a primary key, etc.

Thanks for the refresher. What I took from it is:
* Look for a natural key (even if it's a composite key)
* It's Ok to use a natural key as the primary key, even if it isn't 'static', so long as it's at least 'stable'
* However, define an invisible surrogate key if the natural key is composite

> I can't see the primary key here, but then I don't really understand the usage. Is the PK going to be the 2 columns combined.

No, one of the columns that I ommitted for brevity is the TagAndText table's primary key.

> That looks fine. The first column is acting as a surrogate key.

Given that the tag string values are a natural non-composite key (a variable-length string about 1 to 6 characters in length), and are what you called 'stable', are you saying that you would nevertheless define a surrogate key in the PermittedTags table?
Christopher Wells Send private email
Tuesday, July 18, 2006
 
 
"No, one of the columns that I ommitted for brevity is the TagAndText table's primary key."

Best to include it then. What is it?

"> That looks fine. The first column is acting as a surrogate key.

Given that the tag string values are a natural non-composite key (a variable-length string about 1 to 6 characters in length), and are what you called 'stable', are you saying that you would nevertheless define a surrogate key in the PermittedTags table?"

Up to you. I see the possibility that they might change. That's still OK. Cause cascading updates will take care of that. Also it's text, which might be a bit slower for comparisons, whereas with a surrogate key you could use and integer. But until you get massive tables might not matter.
Database purist
Tuesday, July 18, 2006
 
 
"TABLE TagsAndText
Tag Text
1 'This is a heading'
3 'This is a paragraph'
2 'This is a subheading'"

Is this ever going to happen in that table:?

Tag Text
1  'This is a heading'
1  'This is something else'

Reason I ask is, it looks to me like the 'this is a xxx' column really belongs in the permitted tags table.
Database purist
Tuesday, July 18, 2006
 
 
Doh. That's the actual text isn't it? I was hit by stupidity.

Your tags table. The more I think about it. Yes, have a surrogate primary key. I think you might want a closing tags column. I'm not an HTML expert, but I imagine that you might have some sort of routine which extracts the opening tag applicable to a particular piece of text, puts it into a file, gets the text, puts it into a file, gets the closing tag, puts it into a file.

And are closing tags ALWAYS "</" plus opening tag plus ">" ?

Also, you might want other fields in the tags table to describe other attributes, which vary according to what tag it is. You could do away with css altogether and hard code your fonts there, for instance. Basically, I see all sorts of reasons why that table will change. So a simple integer PK looks best.
Database purist
Tuesday, July 18, 2006
 
 
> some sort of routine which extracts the opening tag etc ...

If the table contains data like this ...

  TABLE TagsAndText
  Tag Text
  'h1' 'This is a heading'
  'p' 'This is a paragraph'
  'h2' 'This is a subheading'

... then this maps to HTML that's formatted as follows:

  <h1>This is a heading</h1>
  <p>This is a paragraph.</p>
  <h2>This is a subheading</h2>

> And are closing tags ALWAYS "</" plus opening tag plus ">" ?

Yes, always.

> other fields in the tags table to describe other attributes

Attributes are a separate problem, and other tables.

> So a simple integer PK looks best.

I think I'll try defining the tags as a "char(8)" column and uing that, the natural key, as the primary key. Thanks again.
Christopher Wells Send private email
Tuesday, July 18, 2006
 
 
I'm curious about storing the XHTML in the SQL database... what are you getting out of doing that that you can't get by just storing the XHTML in the file system?  It seems like plain-old-string-searching/regexps will get you anything a where clause could get you, maybe more for less work, and it's less hassle to set up.
Pat Morrison
Tuesday, July 18, 2006
 
 
Arn't "keys" supposed to be unique?  I think you're going to have LOTS of "h1" header tokens.
AllanL5
Tuesday, July 18, 2006
 
 
Nothing against the OP, but this seems to be another case of overtly generalized descriptions of database usage looking for descriptive answers.
D in PHX Send private email
Tuesday, July 18, 2006
 
 
> what are you getting out of doing that

I'm hoping I'll be able to 'select' views (subsets) of documents, and to 'join' information from multiple documents.

> Arn't "keys" supposed to be unique?  I think you're going to have LOTS of "h1" header tokens.

Yes, keys are usually unique, and 'primary' keys must be unique. http://www.tonymarston.net/php-mysql/data-dictionary.html#dict-table-key for example implies that not all keys are unique.

When I said "The Tag values aren't unique/primary in the table, but are used as a *key*", perhaps I meant "they'll be used as an *index*"?

Yes I'll have lots of "h1" header tokens.

> Nothing against the OP, but this seems to be another case of overtly generalized descriptions of database usage looking for descriptive answers.

Sorry I don't understand what you're saying.

If these questions seem general it's because I'm an SQL novice, and am trying to fill in gaps in my experience. SQL reference documentation tells me what I can do, but when I have a choice of several ways (as in this example) I'm looking for the heuristics that help decide between alternatives.
Christopher Wells Send private email
Tuesday, July 18, 2006
 
 
Christopher, it wasn't a jab directed at you.

What I meant is that database problems are typically very related, dependent even, upon what you are storing and why you are storing it.

So if you could give more background on why you are storing the stuff, how it's going in and what you are doing with it after that, folks can help better.

Hope that makes more sense.
D in PHX Send private email
Tuesday, July 18, 2006
 
 
Is the order in which the tags appear important or are you just analyzing some XML? Does it matter if a query returns "<h1>foo</h1>", "<h1>bar</h1>" or "<h1>bar</h1>" , "</h1>foo</h1>" ?  If so, you need to track the order.  I'm just saying this because I don't see how you could generate any practical XML with just that table.
Grant
Tuesday, July 18, 2006
 
 
Christopher has said there are more fields detailing the relationships between the tags.  I suspect order information is also in there.

Unfortunately, the problem with dumbing down one's examples is that everything thinks your dumb!
Almost H. Anonymous Send private email
Tuesday, July 18, 2006
 
 
I don't think the guy is dumb, just database issues are closely tied to the data, how it gets there and what needs to be done with it afterwards.

More detail = better answers.
D in PHX Send private email
Tuesday, July 18, 2006
 
 
> > what are you getting out of doing that

> I'm hoping I'll be able to 'select' views (subsets) of
> documents, and to 'join' information from multiple
> documents.

... in my mind, the case for doing this in SQL would be one of... a) I tried it via scripts/string processing, it wasn't fast enough and then I tried SQL and it was fast enough to be worth the trouble
b) I need to apply some kind of user access/permissions that fit within an existing infrastructure that depends on SQL
c) I've got a bunch of concurrent users and I want to let the DBMS sort 'em out.

There are probably other good reasons, perhaps some that you have.  But from the surface of things seen here, it might be appropriate for you to try a few of your queries out using good-old-fashioned code before having to go and design a SQL table to replicate structure you already have in your XHTML. 

If you do go the SQL route, I think you'll definitely need unique keys for every entry in the table.  My strong inclination is to go with one document per row rather than to break it down by tags(which isn't too different than my rant above about why SQL at all)... because you'll be replicating the structure of the XHTML in a table if you do break it down by tags... which presumes some pretty structured XHTML... or some really recursive SQL definitions.  And if it's the latter, you'll have to wind up writing what amounts to an XML parser in SQL Select statements to get much out of the db anyway.
Pat Morrison
Tuesday, July 18, 2006
 
 
I wrote:
> unique keys for every entry in the table.

meaning artificial keys in addition to the 'natural' keys implied by the tags.. because you'll get plenty of examples of any given tag, and you'll need some way to sort them out.

I really should read more carefully before I post.

HTH
Pat Morrison
Tuesday, July 18, 2006
 
 
"Yes, keys are usually unique, and 'primary' keys must be unique. http://www.tonymarston.net/php-mysql/data-dictionary.html#dict-table-key for example implies that not all keys are unique."

I don't know who Tony Marston is, but in the meaning of a relational database keys are unique and non null (EXCEPT foreign keys). I think you may have misinterpreted his metadata thingy.

This is a better explanation.

http://www.aisintl.com/case/relational_keys.html

A key, candidate, primary or alternate is unique and non null.

"If these questions seem general it's because I'm an SQL novice, and am trying to fill in gaps in my experience. SQL reference documentation tells me what I can do, but when I have a choice of several ways (as in this example) I'm looking for the heuristics that help decide between alternatives."

Sorry, but the fact that you think all this stuff is an SQL issue says a lot about your understanding of relational databases. This isn't about SQL, it's about database design.

Stop what you're doing, buy 'An Introduction To Database Systems' by CJ Date. Read it and understand it. Practical Issues in Database Management by Fabian Pascal makes a good read too.

Without a firm understanding of the basics, you'll just be making lots of poor decisions. Sorry.

Don't worry, most people can't be bothered to get any of this right, so you're not alone.
Database purist
Tuesday, July 18, 2006
 
 
"I'm hoping I'll be able to 'select' views (subsets) of documents, and to 'join' information from multiple documents."

This really is the domain of XSL, not RDBMS.
UNIX Guy
Tuesday, July 18, 2006
 
 
DOH! Excuse the nonsequitor, I should read this in the middle of a build.

I think what you're looking for is not a key or a foreign key constraint, but a CHECK constraint.  Then you can make sure that the field comes from a list of given values ('h1','p','body', etc) without having to create meaningless tables or joins all over the place to resolve with your FK Constraint.
Grant
Tuesday, July 18, 2006
 
 
Rubbish. If the values in a field can only come from a 'list of valid values' then make it a foreign key to a table WHICH CONTAINS THOSE VALUES. It's called a lookup table.

This is better in every way then a CHECK constraint:

1. It's a gazillion times easier to do an interface to allow users to add to/delete from your list of valid values.

2. When later on you decide that there is SOMETHING ELSE about the 'list of valid values' that you want to store information about it's easy to add an extra column to the lookup table.

3. Referential Integrity via the foreign key mechanism is supported by practically all RDBMSs. And is a more 'basic' part of a RDBMS. So if you decide to change the RDBMS you can be fairly sure it will work. You can't be so sure that the CHECK constraint will be supported.
Database purist
Tuesday, July 18, 2006
 
 
Purist, I noticed 2 things there.

1, a relational db just means it's stored in rectangular tables. An Excel spreadsheet is technically a relational db (though not a very good one).

2, 1NF just means that every tuple has a single value. 2NF means that every row has a primary key (whether enforced by the DB or not). 3NF means that no field in a row is derived from any other fields in the row (i.e., not "summary" columns).
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
"2, 1NF just means that every tuple has a single value"

Go on then, how can a tuple have a single value?
Database purist
Tuesday, July 18, 2006
 
 
So if I have a field that stores a person's sex, should I create a whole Foreign Key relationship and seperate table and an web based admin interface for when the possible values change?
Grant
Tuesday, July 18, 2006
 
 
>> 1, a relational db just means it's stored in rectangular tables. An Excel spreadsheet is technically a relational db (though not a very good one).

Ouch. Not really. In fact, not at all.
David Aldridge Send private email
Tuesday, July 18, 2006
 
 
Hey ho, you know people are stupid, ignorant and incompetent when they start with this sort of stuff. What makes you all of those is that you don't care enough to be interested and to learn. Anybody can make any sound idea look ridiculous by coming up with what with some extreme example. And if you actually cared about the field you work in you would be ashamed of yourself.

But I suspect you're more interested in point scoring.

I usually have a field with a constraint for your example. But if I saw a database with a lookup field for gender I wouldn't really mind. After all how long would it take?

And who are you, with your closed mind and you blistering lack of imagination to say that in 50 years, or 100 years, when transexuality is more common, that this might not be an issue.

And it's been done that way, apparently.
Database purist
Tuesday, July 18, 2006
 
 
That was aimed at Grant, by the way.
Database purist
Tuesday, July 18, 2006
 
 
A relation is the mathematical term for a rectangular table, is all. Those silly Excel worksheets are rectangularly shaped. A trivial example, not one that I would recommend, but it is.

Oracle, in its documentation, brought down a relational db stored in a log book, written in pen and paper.

OK, here's an example of a tuple (cell, basically) that would violate 1NF: Hirsch, Steve. If you consider my first and last names to be separate fields, then it would violate 1NF.

"So if I have a field that stores a person's sex, should I create a whole Foreign Key relationship and separate table"

my bias is towards saying yes.

"and an web based admin interface for when the possible values change?"

No need. My bias would be towards just using the command line or TOAD.
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
Steven doesn't know what the hell he's talking about. Christopher, please ignore him.

"Purist, I noticed 2 things there.

1, a relational db just means it's stored in rectangular tables. An Excel spreadsheet is technically a relational db (though not a very good one)."

Just because it's represented as a square thing on your screen doesn't make it the same. A column in a db table should have a type, and should be defined over a domain. Excel can have anything in it. Each attribute should contain an atomic value. You can stuff anything in an Excel cell, including formulas. Rows in an excel spreadsheet can be duplicates. A table in a database shouldn't have duplicates, cause it's not a set then, therefore not a relation.

When I have users or clients and I want to give them an inisght into what I do I'll describe a table as 'being a bit like and Excel sheet'. But to come to a forum for competent people like this one and use that comparison is appalling

"2, 1NF just means that every tuple has a single value."

Well, plus no duplicate rows. I assume you mean each attibute/column intersection has a single value.

"2NF means that every row has a primary key (whether enforced by the DB or not)."

Non key attributes dependent on the whole key.

"3NF means that no field in a row is derived from any other fields in the row (i.e., not "summary" columns)."

Yuk. No transitive dependencies. "Summary columns" kinda gives the game away
Database purist
Tuesday, July 18, 2006
 
 
Come to think of it, in clinical trials, there are usually more than 2 values for gender. Most trials had a value for "Unknown", for example.
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
Steve said:

"OK, here's an example of a tuple (cell, basically) that would violate 1NF: Hirsch, Steve. If you consider my first and last names to be separate fields, then it would violate 1NF."

Now all of you go and look up what the word 'tuple' means in the context of relational databases.

A tuple isn't 'like' a cell. Atall. Shut up Steve, every post further displays your ignorance
Database purist
Tuesday, July 18, 2006
 
 
Boy, what a jerk. Must be a lot of fun to work with.

I've been making a living doing relational db programming for over 10 years now, don't know wtf I'm doing, I guess. Anyways, from Wikipedia:

Relation: In relational modeling, a relation is a set of tuples, otherwise known as a table.

Tuple: Usage in computer science:
In computer science, tuple can have two distinct meanings. Typically in functional and some other programming languages, a tuple is a data object that holds several objects, similar to a mathematical tuple. Such an object is also known as a record.

In some languages, and especially in database theory, a tuple is usually defined as a finite function that maps field names to a certain value. Its purpose is the same as in mathematics, namely to indicate that a certain entity or object consists of certain components and/or has certain properties, but here these components are identified by a unique field name and not by a position, which often leads to a more user-friendly notation. The general term for this construct is an associative array; other programming languages have yet other names for the concept.

A small example of a tuple would be:

( player : "Harry", score : 25 )

I was trying to translate arcane, useless-in-the-real-world mathematical trivia to something understandable and useful. I stand by what I wrote.

Oh, and by the way Purist, ever wonder why the full relational algebra isn't implemented in SQL?
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
In a relational database system a tuple is a record in a table.

That's it.

I don't care if "I've been making a living doing relational db programming".

You don't know your job.
Database purist.
Tuesday, July 18, 2006
 
 
"Oh, and by the way Purist, ever wonder why the full relational algebra isn't implemented in SQL?"

Not really, but maybe I'll look it up in Wikipedia.

The same way you just looked up tuple.
Database purist.
Tuesday, July 18, 2006
 
 
The reason why it has never been implemented is because it is unnecessary and a waste of time. Nobody in the real world, developing sw that people actually use, misses it.

You see, people in the real world, trying to be productive, have no time for esoteric, pointless purity meant to show how smart you are.

So, assume that I make good money developing relational db software, and assume for purposes of argument that you are right, I don't know what I'm doing. How can that be? Are other people that stupid with their money?
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
You tried to show that you know relational database theory. But your failed.
Database purist.
Tuesday, July 18, 2006
 
 
I tried to impart to the OP a little practical knowledge, a little bit of theory softened by analogy. I know all the relational theory that I need.

You, on the other hand, were really nasty to both me and Grant for what reason? Instead of assuming that we knew what we are doing, maybe with a different point of view, you assumed that we are idiots because we can't recite pointless mathematical trivia off the top of our heads.

You come off as a JoS version of the Taliban's Purity squads. +1.
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
I thought about weighing in on the tuple/etc bit... I looked at my copy of Date on the shelf, and at purist's purity arguments, and at Steve's vernacular arguments that nearly say the same things but talk past each other... but I'm not quite certain it's going to help Chris solve his problems.

Chris, feel free to chime in and indicate what is or isn't helping you figure things out.
Pat Morrison
Tuesday, July 18, 2006
 
 
WTF happened to this thread?

OP: You'd be better off posting a more complete schema and some key requirements which will really help get higher quality answers.  It sounds a lot like a database-backed content management system.  That's fine, but it's got drawbacks - SQL doesn't do hierarchies well.  Just because SQL is your hammer, doesn't mean that this is the nail to hit.  But with a limited scope and controlled requirements, a relational DB can do the job.

Your original question raised good points about natural keys and you got some good responses on that front.

I think a Tags table is fine, but TagsAndText doesn't make sense to me in the long term - like you said, a table named DocumentNodes makes more sense (they just happen to have TagIDs).  Naming it TagsAndText is like naming a table of Orders "CustomersAndOrders" just because each row contains a CustomerID column.

For flexibility in the face of the future and with an eye to your requirements, maybe you should keep in mind something a little more flexible like XML and XSLT, as some posters have mentioned.
Cade Roux Send private email
Tuesday, July 18, 2006
 
 
Sorry, got carried away here. If you're using Oracle, there are ways to simulate a hierarchical db, like the OR features, or the CONNECT BY.
Steve Hirsch Send private email
Tuesday, July 18, 2006
 
 
Ok.

@D in PHX
> what you are storing and why you are storing it

I began to explain that, in http://discuss.joelonsoftware.com/default.asp?design.4.361252.41#discussTopic361493 ... I may try to answer any specific questions, but given Paul's post http://discuss.joelonsoftware.com/default.asp?design.4.361252.41#discussTopic361733 I don't want to make SQL Jesus cry again.

@Pat Morrison
> the case for doing this in SQL would be one of... a) I tried it via scripts/string processing
I imagined doing it via scripts and string processing and didn't want to try it:
* Fragile (not ACIDic)
* Difficult (I haven't found XSLT easy to write, even with support from a tool)
* Slow (iterating over whole files for each transformation, whereas once it's flattened into a DB it's all been effectively pre-parsed and indexed)

> worth the trouble
This is an at-home project, so it's not subject to that kind of constraint.

>  it might be appropriate for you to try a few of your queries out using good-old-fashioned code
I'll need code too, to transform the data to/from its DB-ready representation. I have a feeling that if I don't use a DB then I'll end up trying to implement some DB-like features in my own code, and not doing it well. In any case, the exercise of developing a suitable DB schema is instructive.

> c) I've got a bunch of concurrent users and I want to let the DBMS sort 'em out.
That's another good reason: multi-user concurrent editing of an XML document seems like a disastrous idea to me. I have read that Wikis, for example, are more usually implemented with a file-based than a DB-based back-end, but I'm going to try this with a DB.

> some really recursive SQL definitions
There are techniques for handling trees in SQL without recursion: http://discuss.joelonsoftware.com/default.asp?design.4.355493.41 ... I'm going to try one.

> meaning artificial keys
Yes I'll have an artificial key for each node.

@Database purist
> http://www.aisintl.com/case/relational_keys.html
Point taken: 'index' isn't the same as 'key'.

@UNIX Guy
> This really is the domain of XSL
True.

> not RDBMS.
Not true.

@Grant
> I think what you're looking for is a CHECK constraint
Good point. Thanks!

@Cade Roux
> You'd be better off posting a more complete schema and some key requirements which will really help get higher quality answers.
Thanks for your offer: perhaps later.

> a database-backed content management system.
Yes, that's it.

> But with a limited scope and controlled requirements, a relational DB can do the job.
That's my feeling too.

> ... is like naming a table of Orders "CustomersAndOrders" just because ...
Ok, I see.

@Steve Hirsch
> If you're using Oracle, there are ways to simulate a hierarchical db, like the OR features, or the CONNECT BY.
Yes I've seen CONNECT BY mentioned when I was researching hierarchies in databases, and I've considered object databases, but I don't think I need such a vendor-specific extension yet.

Cheers.
Christopher Wells Send private email
Tuesday, July 18, 2006
 
 
"database-backed content management system"

I'm curious why, in a content management system, you're actually storing tags / XML in the database.  Is that not a bit low-level?  I'm not entirely sure what you're trying to accomplish.  I've written a database-backed, web-based content management system.  Now everything I stored was render-neutral, e.g. I never stored HTML, it was just generated from the data.  Now, if you are just trying to store data, perhaps you don't need such a literal translation from XML. 

Maybe you could give me a better idea of what you are creating to satisfy my curiousity?
Almost H. Anonymous Send private email
Wednesday, July 19, 2006
 
 
> Now everything I stored was render-neutral, e.g. I never stored HTML, it was just generated from the data.

The data must include *some* kind of markup information, mustn't it? If your content (your document data) includes a paragraph, followed by a list, followed by a picture, followed by a hyperlink, how do you store this except by storing some kind of markup (HTML-like markup, or Wiki-like markup, and/or something that encodes the document structure)?
Christopher Wells Send private email
Wednesday, July 19, 2006
 
 
OP wrote...
* Fragile (not ACIDic)

I think script infile1 infile2 > outfile meets the ACID criteria, at least in single user contexts. 

> * Difficult (I haven't found XSLT easy to write, even with support from a tool)

I still shudder when I think of the XSLT I've written.  I felt like a bad person, even though I tried hard.  I did find Jen Tennison's Wrox XSLT book to be a good tutorial... but I'd only do XSLT again if I were forced to do so.  Martin Fowler wrote a bit (http://www.martinfowler.com/bliki/MovingAwayFromXslt.html) about how he's using Ruby (but it could just as well be Python or Perl or... PHP) to do his XML parsing/reassembly, after having given up on XSLT... so I feel less like a bad person. 

If the choice were one of SQL or XSLT, I'd pick SQL too... but it's not the right hammer for every nail. 

* Slow (iterating over whole files for each transformation, whereas once it's flattened into a DB it's all been effectively pre-parsed and indexed)

iterating over whole files for each transformation... vs. the overhead of building the map between your XHTML and the RDBMS, before you can get to where you want to go.

By this point in my series of posts, I've realized that my only point is "try a scripting language for this", and that you've already made your call on that.  So I'll just wish you happy coding and move along.  Good luck!
Pat Morrison
Wednesday, July 19, 2006
 
 
"I tried to impart to the OP a little practical knowledge, a little bit of theory softened by analogy. I know all the relational theory that I need.

You, on the other hand, were really nasty to both me and Grant for what reason? Instead of assuming that we knew what we are doing, maybe with a different point of view, you assumed that we are idiots because we can't recite pointless mathematical trivia off the top of our heads.

You come off as a JoS version of the Taliban's Purity squads. +1. "

I was 'nasty' to poor little Grant because he tried to use an extreme example to ridicule my idea that you can use lookup tables to limit values in a field, rather than a check constraint. He failed.

Anybody can come up with that sort of nonsense. But as all the programmers know, 'edge cases' are what often floor a program. The decision whether or not to use a lookup table for genders is just such an edge case.

Nevertheless it gave you the perfect opportunity to further demonstrate your lack of skill by chiming in with you 'sometimes unknown is included' comment. Unknown/missing values are a problem in database design. We all know that. You could have perhaps pointed out to the OP whom you claim to be helping that doing that has it's drawbacks.

The meaning of a database is its data, you cannot design a good relational database without understanding the data, and the mathematical foundations of design, rooted as they are in set theory and predicate logic. Your extremely sloppy and sometimes simply incorrect use of the terminology reveals that you don't really understand. Whether you make a living from database design or not is neither here nor there.

Me too. Frequently I am called upon to work on MS Access databases that clerical or admin staff in a business have designed. They've been paid for that. Does that mean they are 'professional' database designers too?

Christopher - get a copy of CJ Dates An Introduction To Database Design. That's the bible for us religious bigots ;-/

Google searches on this subject reveal some useful information. But also a lot of misleading information. That book would be better.

If you're going to use an RDBMS to store your stuff then you owe it to yourself to get it right. It'll pay off in the long run. And makes a refreshing change as an area of study from Loops and Conditions monthly.
Database purist
Wednesday, July 19, 2006
 
 
An Introduction to Database _Systems_, o misbeliever!
Larry Lard Send private email
Wednesday, July 19, 2006
 
 
Yes, sorry.

An Introduction To Database Systems (8th edition is the one I've got, fairly recent), by CJ Date. ISBN 0-321-18956-6

That probably means I'm excommunicated now.
Database purist
Wednesday, July 19, 2006
 
 
Also good, (better I think, for most programmers), is Date's more recent book, "Database in Depth".  It is significantly slimmer and cheaper too.
Paul Mansour Send private email
Wednesday, July 19, 2006
 
 
"The data must include *some* kind of markup information, mustn't it?"

Absolutely.

"how do you store this except by storing some kind of markup (HTML-like markup, or Wiki-like markup, and/or something that encodes the document structure)?"

Right.  I'm just not sure I get why you've got a database that's such a literal translation of XML.  Why do you need to break the document down where every row is a tag?  Do you really need to query a single <em> block from the document? 

Some things are better as a non-literal translation, for example hyperlinks.  I never stored anything like a <a href> tag -- instead it's a pointer to another record which gets translated into a hyperlink at render time.  This way documents can easily be moved around and renamed.
Almost H. Anonymous Send private email
Wednesday, July 19, 2006
 
 
While this thread made me understand Joel's "pin in my eyes" comment, I did have a relevant thought for the OP. Are there any OO databases like GemStone still around? I do think that Altova may still make some. They are the old hierarchical databases, they could be that middle ground that the OP is looking for.
Steve Hirsch Send private email
Wednesday, July 19, 2006
 
 
db4objects is one which looks promising.
Cade Roux Send private email
Wednesday, July 19, 2006
 
 
> "try a scripting language for this"

FYI, coming from a C++ background and now using C#, I don't know what the difference is between a scripting language and C#: C# exposes no memory management, and the techniques that Fowler cites (reflection and subclassing) are more-or-less available in C# too.

> Good luck!

Thanks!

> Why do you need to break the document down where every row is a tag?  Do you really need to query a single <em> block from the document?

I don't: for example, Pat suggested "go with one document per row" ... there may be a middle ground, where I break the 'interesting' bits (eg. titles and hyperlinks) into one row each, and lump the intermediate blocks of text into one row each, eg. like this:

TABLE Rows
RowType RowText
'head' 'This is a section title'
'text' '<p>This is a paragraph with an '
'link' 'embedded reference'
'text' ' and the end of the paragraph too.</p>'

> db4objects is one which looks promising

I looked at it briefly a while back: it seems popular, its license is reasonable, and its API looks natural for an OO programmer who wants to persist his object graphs and to materialize nodes 'just in time'; but anyway.
Christopher Wells Send private email
Wednesday, July 19, 2006
 
 
> lump the intermediate blocks of text into one row each

A reason for choosing shorter rows of perhaps one sentence each (even if they're not hyperlinked), each with an artificial key, might be that this would facilitate calculating and storing merges, diffs, and text revision history.
Christopher Wells Send private email
Wednesday, July 19, 2006
 
 
... ok.

at the end of this long story, I still dont understand why are we trying to force hierarhical documents into a relational structure - instead of directly using the documents themselves and going with xpath queries...

you're using c# ... wonderful: .net framework is full of object libraries made for this (xml, xsl, xsd, xpath queries.....)
Johncharles Send private email
Thursday, July 20, 2006
 
 
@Johncharles: does the .NET library have infrastructure like indexes, snapshots, views, etc. that are built into most relational dbs? Serious question, don't work much with .NET, maybe there's more infrastructure built than the last time I looked.
Steve Hirsch Send private email
Thursday, July 20, 2006
 
 
steve, you can infere and build datasets from xml fragments... and dataset have keys and some basic "view" and "join" functionality, but I see your point.... it also depends on how much data.

i dont want to "insult" nobody, so I go on posting on ++

:)
Johncharles Send private email
Thursday, July 20, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz