The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

A view puzzle

Given the following SQL tables, I'm trying to get a view including the highest level categoryID for a given item. So for example the highest level categoryID for ItemID 1 - Villa is USA. Can anyone suggest the best way to do this?


CategoryID  ParentID  Desc 
1            -1        USA
2            1          Florida
3            1          San Francisco

CategoryID    ItemID
2            1
3            2

ItemID      Desc    Price
1          Villa    1000000
2          House    2000000
Thursday, July 27, 2006
In your example, I see how Villa, ItemID 1, relates to CategoryID 2 in theCategoryItems table.  I don't see how CategoryID 2 relates to USA in the Categories table, so I don't see how you make a relationship between Villa and USA - "highest level" or any other relationship.

And ... what do you mean "highest level" here?  There is no reference to the word "level" anywhere in the DB schema you present.
Thursday, July 27, 2006
Yes, please clarify highest and table joins and we'll try to help
Sevenoaks Send private email
Thursday, July 27, 2006
Can categories have a grandparent (parent ID of Parent is not -1)?
Cade Roux Send private email
Thursday, July 27, 2006
I'm assuming this is the classic "recusive ancestor" query?  In standard SQL, there's no way (that I know of) to do it as a single view.  There are some SQL extensions (I believe Oracle has one) that do it.  Another approach is to maintain a denormalized table that contains every ancestor relationship.
Mike S Send private email
Thursday, July 27, 2006
... recursive ...
Mike S Send private email
Thursday, July 27, 2006
I've always solved such problems by structuring the tables to show descendants, not ascendants.  In this scenario, USA would be an attribute of Villa, and then I add additional columns for each level I wanted subdivide the previous level.

So this would be...

Villa : 2,000,000 : USA : Florida : Orlando

How do we make sure that people don't insert Orlando in the fourth column instead of where it belongs in the fifth? Constraints (or pick-lists at the presentation layer).

You will notice that the Domain Name Service works the same way - "com" is parsed first, then "joelonsoftware" and then finally "discuss".

The "recursive ancestor" problem is one of the textbook examples of why it's sometimes more efficient to do things the wrong way. (Another example is introducing redundancy back into your tables in order to eliminate complex joins.)
Thursday, July 27, 2006
The short answer is that you can't write a single standard SQL statement that resolves the relationships in one pass (largely because standard SQL doesn't allow you to build looping constructs beyond SUM, COUNT, AVG, GROUP and so forth.)

It will always be n-number of passes where n is the depth of your tree. In this case, you can construct the view with two nested SQL queries but as soon as you add a third layer, you will have to add a third query.
Thursday, July 27, 2006
Stupid question, why would you use a view in the first place?
D in PHX Send private email
Thursday, July 27, 2006
>>I don't see how CategoryID 2 relates to USA in the Categories table, so I don't see how you make a relationship between Villa and USA

CategoryID 2 relates to Florida, Florida's parentID is 1 USA.
Friday, July 28, 2006
>>Can categories have a grandparent (parent ID of Parent is not -1)

No, USA would be the highest level category.

Friday, July 28, 2006
>>Stupid question, why would you use a view in the first place?

View or not, it would still need to be performed in a SQL statement or controlled by the a recursive function in the code.

Friday, July 28, 2006
K, theDavid's point is that there isn't a single command in SQL to recursively get the ultimate ancestor.  Instead, your code has to instruct the database has to recursively retrieve ancestors until it gets to a record that has no ancestor.  This is a lot of extra work for the server, and there's not a simple way to do it in SQL.

I agree with the David's suggestion to modify the Categories table.  Add a field for UltimateAncestor, and populate it for all current records.  When you add a new category, walk back through the hierarchy and populate the UltimateAncestor field.  Your code will only have to go back until it finds an ancestor that has an UltimateAncestor defined - typically, this will be the immediate parent of your new category.  This limits the extra lookup step to the times you add categories - which is likely much less frequently than you add or look up items.
Saturday, July 29, 2006
Here is a solution based on the assumption you are only looking at three levels deep (reason to follow).  The self left joins return the ParentID or null if there isn't one. 

One note about the sample data.  San Francisco should probably have a parent of California insted of USA.  Adding California required the third left join to the categories table to get GreatGrandParent values which would end up being null for ItemID 1.  So, I used the coalesce function to derive the TopID.

While this may not be a favorite solution for some here, I believe it is a solution to the puzzle you posted.

-- p = parent
-- gp = grandparent
-- ggp = greatgrandparent

select i.ItemID,
    i.[Desc] as ItemDesc,
    p.[Desc] as ParentDesc,
    gp.ParentID as GParentID,
    gp.[Desc] as GParentDesc,
    ggp.ParentID as GGParentID,
    ggp.[Desc] as GGParentDesc,
    coalesce(ggp.ParentID, gp.ParentID, p.ParentID) as TopID
from Categories as p
left join Categories as gp
    on p.ParentID = gp.CategoryID
left join Categories as ggp
    on gp.ParentID = ggp.CategoryID
join CategoryItems as ci
    on P.CategoryID = ci.CategoryID
join Items as i
    on ci.ItemID = i.ItemID
where i.ItemID = 1

1 Villa 1000000 2 1 Florida -1 USA null null -1
FitCoder Send private email
Saturday, July 29, 2006
And what happens if the customer asks for a region to be added, such that you have USA, South, Florida, Naples?  Or USA, West, California, San Francisco?  :)

I actually like your solution - but as I said, it depends on the number of tiers being constant.

I still believe that in the long run, it is simply easier to add columns to a single table and write a separate, simple SQL query for each depth needed.
Monday, July 31, 2006
We've used a single field in the past to concatenate the catgeory hierarchy for each category:

CategoryId    ParentId    CategoryTree
1        -1        :1
2        1        :1:2
3        1        :1:3
4        2        :1:2:4

The Category Tree starts with a separator, then the top-level category for that item, then a separator, then the next category, on down the line.

The first CategoryId in the CategoryTree is the highest parent category.

In addition, to find children of a particular category, just search for categories that begin with the same CategoryTree as the category. In the example above, to find children of CategoryId 2, search the table for CategoryTree LIKE ':1:2%'.

The difficulty arises if you insert or delete new categories into the middle of a hierarchy - you will have to restructure each CategoryTree again. Of course, it's pretty easy to do - a replace will do the trick. And since you are more likely doing many more selects than you are inserts/deletes, it works out well.
Honroy Fobicizer Send private email
Tuesday, August 01, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz