The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Stuck on a Design Issue

I've written an application that is working but I think I could have come up with a better data architecture.

I'm storing nodes, arranged in a tree hierarchy by date.  For instance, click on Sept. 16, and the database is read to retrieve a hierarchical tree of nodes for that day.  A node has a bunch of information that isn't important to the design problem, but the order of the nodes is important.

For an example of the kind of tree, see http://www.lusas.com/images/treeview.gif -- this isn't my program, just the kind of tree I'm talking about.

So I can look up and render the tree, I store the date, absolute position, indent level, and other node data for each node.  This makes for a lot of work when a node is moved to a different position in the tree, or moved to another day.  It just seems kludgy basing everything off of position and level.

Is there a smarter way to store these nodes in a database so it isn't so much work to maintain the state of the tree for each day?
Brian Leach Send private email
Tuesday, September 04, 2007
 
 
Sounds like a pretty straightforward data structure.  Rather than storing your data as arbitrarilly indented things that happen to be ordered to look like a tree, why not just store it as an actual tree structure?

create table Node
(
  NodeID int identity
  , ParentNodeID int null
  , SortIndex int
  , DescriptiveColumns whatever
)

SortIndex only needs to live there if you want to be able to arbitrarily sort things under a given parent.

Good luck!
Jason Kester Send private email
Tuesday, September 04, 2007
 
 
"I store the date, absolute position, indent level, and other node data for each node.  This makes for a lot of work when a node is moved to a different position..."

This jumped out at me. Why are you storing the absolute position and indent level? Customarily, you don't move just one node because as you've noticed, it's a very expensive task.

Typically, you "rebalance" the entire tree all at once such that the nodes are evenly distributed across the branches. In extremely large trees, you may simply rebalance a subset of the tree; for example, you may choose to optimize the month of September rather than every available calendar entry. If you're writing to a database, then it makes sense to do this rebalancing every time the application starts.

I bring this point up because if you eliminate the need for absolute positions and indent levels, then you can use Jason's extremely simple approach for saving nodes, and simply build (and balance) the tree itself when you read from the database.
TheDavid
Tuesday, September 04, 2007
 
 
I apologize. I was thinking of actual binary trees, not hierarchial folders. Overall, the principle is the same but you'll have to build the tree from the bottom up unless you have some sort of dummy "root node" that can act as the parent node for the top level folders. More specifically, the database will need to indicate those folders should be at the top.
TheDavid
Tuesday, September 04, 2007
 
 
+1 Jason .. I would add date, and some oracleisms :)
create table Node
(
  NODE_ID number default seq_node_id.nextval()
  , PARENT_NODE_ID number -- could be null :)
  , SORT_INDEX number
  NODE_DATE
)

then do a connect-by query where node_date = required date starting with parent_node_id is null.
Totally Agreeing
Tuesday, September 04, 2007
 
 
There are some other options referenced at http://discuss.joelonsoftware.com/default.asp?dotnet.12.436484.8
Christopher Wells Send private email
Tuesday, September 04, 2007
 
 
I think what you're looking for is similar to this post http://discuss.joelonsoftware.com/default.asp?design.4.431474

There's a good suggested solution (IMHO) in there. http://www.developersdex.com/gurus/articles/112.asp?Page=1
Fung Send private email
Tuesday, September 04, 2007
 
 
Use a linked list?

CREATE TABLE nodes (
node_id INTEGER PRIMARY KEY,
parent_node_id INTEGER,
next_sibling_node_id INTEGER,
...
)

Use the next_sibling_node_id field to order the list after retrieving it from the database.

Wednesday, September 05, 2007
 
 
==>Overall, the principle is the same but you'll have to build the tree from the bottom up unless you have some sort of dummy "root node" that can act as the parent node for the top level folders. More specifically, the database will need to indicate those folders should be at the top.

In a database setup, I usually indicate a ParentID of NULL to flag that "Hey! This is a Top Level (no parent) Node". All other (non top-level "child nodes") will have a valid ParentID pointing to the primary key of it's Parent record. To build the tree, first do a SELECT WHERE ParentID IS NULL, and then recursively build the Child nodes.
Sgt.Sausage
Thursday, September 06, 2007
 
 
Thanks to all for the suggestions . . . some comments on the above replies.

Nested tables are cool, but they solve the tough problem of querying a whole tree from a root node.  Because I index and query by a date, my SELECT statement is already trivial.

Each node has a specific position in its own level, so I could use a position number in its own tree as opposed to a top to bottom (list-like) position.  This does buy me something:  when I drag a node with children to a different place in the tree, I don't have to update the position of the node's children.  I still have to update all of the siblings at the source and target level, because their position may change.

I think that this is just an ugly problem, no way around it . . .
Brian Leach Send private email
Thursday, September 06, 2007
 
 
Dino Send private email
Thursday, September 06, 2007
 
 
Just as a side note. I don't know if it was discussed in the linked posts but try not to get into the habit of using NULL as the root node identifier.

Conceptually, it works great. It's elegant, it's simple, it's immediately obvious from reading the code what the significance of a null "pointer" to a parent node is. There is no parent node, therefore it must be the root node.

The problem is that there are a number of frameworks that will translate NULLs into empty strings when you print them out, and others will literally substitute some sort of "NULL" placeholder string. There are also many databases that automatically fail conditionals involving NULLs, and I vaguely remember there's at least one that will allow for an "foo = NULL" style construct in SQL.

When you jump around between different database vendors and programming languages and platiforms (web or desktop), it can get annoying trying to remember all of the special case treatments involving NULL.

In fact, ASP.NET 2.0 has what's referred to as the optimistic concurrency bug; database columns that allow NULLs will never successfully pass when used as the where clause criteria, unless you specifically check for the NULL value also. I'm not sure if the bug exists in SQL Server itself, or in VB.NET and C#.NET for the desktop as well.

This is completely irrelevant to the original discussion, I admit that, but unless you have a damn good reason for using NULL parent node pointers to identify the root node, it's just safer and easier to port everything if you picked some arbitrary value.
TheDavid
Thursday, September 06, 2007
 
 
> In fact, ASP.NET 2.0 has what's referred to as
> the optimistic concurrency bug; database columns
> that allow NULLs will never successfully pass when
> used as the where clause criteria, unless you
> specifically check for the NULL value also.

The "optimistic concurrency bug" is a bug in the SQL that ADO.NET used to generate.  (It doesn't do this anymore.)  This is very far from a reason not to use NULL.

In this particular case, the use of NULL to indicate that a node has no parent is not merely appropriate.  If there's a foreign-key constraint on the ParentID column (which there should be), it's *necessary*.
Robert Rossney Send private email
Monday, September 17, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz