The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Stuck on database design (products and properties)

I think I have a very simple problem, but I am find it difficult to find the solution, so I would really appreciate your help.

I am building a web application with Products and Properties. A product has multiple properties (color, length, long sleeves, material, finish etc.). At the moment we have about 50 products, this will probably double, but not grow large. There are also about 50 properties total, for now. I use a database because I am familiar with them, and I like SQL for querying data, but I wonder if I should perhaps use something different alltogether.

I think/thought the proper way to do this is (simplified, not mentioning categories etc.):

tblProduct:
  productid
  productname

tblProperties:
  propertyid
  propertyname

tblProductProperties
  productid
  propertyid
  propertyvalue

However: if I have to select which products are blue, heavier than 1 pound, have long sleeves and come with gift wrapping, I have to do four inner joins on the product table. This gets cumbersome very quickly.

I read about this before on this forum, and learned that this is called EAV (entity, attribute, value), and that appearantly there is no consensus about professionals about if it is a good or a bad thing. I also found this discussion, where smart people were crital about this exact design: http://groups.google.nl/group/microsoft.public.sqlserver.programming/browse_thread/thread/ea32e6b5530671a1/efc42f88570b8e10

However, Mr Celko gives as an important anti-argument that  the values have to be character strings. I think you could easily remedy that by adding a column

tblPropertyValues:
  propertyvalueid
  propertyid
  valuename

And then instead of tblProductProperties, you make tblProductPropertyValues.

I found a data model on http://www.databaseanswers.org/data_models/products_and_generic_characteristics/products_and_generic_characteristics_alt.htm which says you can put often used categories in its own table. That does make much easier queries, and it still gives you the benefits of data integrity. However: it means an extra column for a lot of properties for me. There will be more products, with new properties, and I will have to alter the database for each new product. That does not sound good to me.

I am very interested in your opinion and would be very grateful for any advise.
Charlotte
Saturday, October 01, 2005
 
 
Is this a transactional database or a reporting database?

If it's primarily intended for reporting, look at star schemas and OLAP cubes.
Chris Tavares Send private email
Saturday, October 01, 2005
 
 
If tblProductProperties purpose is simple to allow a many-to-many relationship between a Product and a Property, I would put the tables together like this:

tblProduct:
  productid
  productname
 
tblProperties:
  propertyid
  propertyname
  propertyvalue
 
tblProductProperties
  productid
  propertyid

Then you can select the property names and values for a product with a simple select statement as follows (assuming you already know the productid):

select
  p.propertyname,
  p.propertyvalue
from
  tblProperties p,
  tblProductProperties z
where
  p.propertyid = z.propertyid and
  z.productid = <your product id>

If you're creating lists of properties for all products, just add a group by clause.
Steve Moyer Send private email
Saturday, October 01, 2005
 
 
<<However: if I have to select which products are blue, heavier than 1 pound, have long sleeves and come with gift wrapping, I have to do four inner joins on the product table. This gets cumbersome very quickly.>>

No joins here.

SELECT *
FROM tblProduct
WHERE tblProduct.ID IN(
  SELECT prop.Product_ID
  FROM  tblProductProperty prop
  WHERE  prop.Name = 'Color'        and prop.Value = 'Blue'
  OR    prop.Name = 'Weight Class' and prop.Value > '1'
  OR    prop.Name = 'Long Sleeves'
  OR    prop.Name = 'Gift Wrapped'
) ;

tblProductProperty (singular to match tblProduct)
  Product_ID
  Name
  Value

tblProperty: (singular to match tblProduct)
  Name
  ValidRegularExpression

and so on..
John Griffiths Send private email
Sunday, October 02, 2005
 
 
You may find your work easier if you use a product that helps manage these data structures.  You didn't mention whether you had relationships or rules among these products, but if you do, you could use something like this:

http://www.resolutionebs.com/industry-solutions/manufacturing.php

Note - I'm biased towards this software (I work for REBS), but there are others out there you could consider, too.

If, on the other hand, there are no relationships among your products, a product like ours would be overkill - stick with the simpler data structures.
REBS
Sunday, October 02, 2005
 
 
Charlotte,

Depending what how your relations are defined you may be doing the right or the wrong thing.

If your implications are concrete like this:

product ->> distributors
product -> suppliers
product ->> clothing
product ->> food
food -> good to
food -> packaging
clothing -> color
clothing -> sleeve type

then your implementation is only a 1NF at best. Hence your complex querying. You are better off having specific tables for each entity with properties as column.

If your implications lean towards the abstract:

product ->> property
property -> type, value

then your implementation is correct and your querying won't get any simpler.
Dino Send private email
Monday, October 03, 2005
 
 
I would suggest that you look into normalization concepts and entity relationship modeling.  Sounds like you have a small system that would be easy to model and worth the effort.

EAV is really the absence of modeling.  Rather than designing or specifying anything, it is meant to be flexible by just dumping everything into one or two tables and calling it a day.

It succeeds in being flexible by not defining the meaning of anything, because a propety can hold any data you throw at it.  It also succeeds in making for complicated, hard to write, hard to maintain, and slow queries, because a property can be anything.
Scot Send private email
Monday, October 03, 2005
 
 
Thanks for your replies, I really appreciate it.

It is a transactional database, so star schemes and OLAP cubes are not useful, but I had never heard of them, the concepts are probably useful at some time in the future.

The query that is the most important for this data is: "give me all the t-shirts for which color=black, long-sleeve=yes, style=aa,fabric=cotton. I do not think the given solutions solve that problem.

I checked the resolutionebs website. I am afraid it would be massively overkill for my application, but I must say the website did not do awfully much to convince me of the usefulness of the product.

I think Dino actually describes my problem: Most properties are concrete, like color and sleeve-type. They probably deserve their own table. But then: what do I do when I add a new product with new features (this is an innovative and often changing market): add a new table?

I am still not sure, but probably this is just a difficult problem. I read a bit more in newsgroups, Celko's point of view is that "If you cannot get to a stable schema, then you need something other than a relational database."
Charlotte
Tuesday, October 04, 2005
 
 
If it's all concrete, then get rid of the meta info - ie as property/value - and design it as relational. Later on, when you have a better understanding of the problem, you'll find the tradeoffs that are right. Refactor the application at that point.

Don't pay the high price upfront, because you don't know what you're getting back for your $.
Dino Send private email
Wednesday, October 05, 2005
 
 
I agree with what Dino just said.  Looks like you have a relatively fixed set of columns anyway, probably more so than you realize, but certainly anything else is unknown right now.  Most products will have color, and price, and size, etc, so just throw those in as columns.

If you need to add a new column later just alter table add column.  If a column doesn't make sense for a product then it would have a null value.

Your queries will be easy and efficient.
Scot Send private email
Wednesday, October 05, 2005
 
 
<<The query that is the most important for this data is:
"give me all the
    t-shirts for which
    color=black,
    long-sleeve=yes,
    style=aa,
    fabric=cotton".
I do not think the given solutions solve that problem.>>

SELECT *
FROM  tblProduct
WHERE  tblProduct.ID IN(

  SELECT prop.Product_ID
  FROM  tblProductProperty prop
  WHERE 
        prop.Name = 'Type'        AND prop.Value = 'T-Shirt'
  OR    prop.Name = 'Color'        AND prop.Value = 'Black'
  OR    prop.Name = 'Long Sleeves'
  OR    prop.Name = 'Style'        AND prop.Value = 'aa'
  OR    prop.Name = 'Fabric'      AND prop.Value = 'Cotton'

) ;
John Griffiths Send private email
Thursday, October 06, 2005
 
 
When constructing tables go back to the facts, if you answer the following questions you might discover its much simpler than it appears.

Can exactly the same product code have different properties of the same kind?

i.e.  Can Product 001 be Red, Blue or Green?

Do particular properties vary with others?

i.e. Does Product 002 of Size Large have a unique price for this product?

Collect properties that do not vary with the uniqueness of the product in the same table as the product.  Collect properties that vary together, Size and price, say, and have them on the same table.

001 Product Size Small Colour Blue Price $10
002 Product Size Small Colour Red Price $10
003 Product Size Medium Colour Blue Price $12
004 Product Size Medium Colour Red Price $12
005 Product Size Large Colour Blue Price $15

Could be mapped to.

Product Table
ID
Description and so on...
Colour

ProductSize Table
ID
Size
Price
Simon Lucy Send private email
Friday, October 07, 2005
 
 
"However: if I have to select which products are blue, heavier than 1 pound, have long sleeves and come with gift wrapping, I have to do four inner joins on the product table. This gets cumbersome very quickly."

To get the list of products that match all of a set of criteria, you can use the approach below.  This uses the tables as you originally defined them in the question.

SELECT productid
  FROM tblProductProperties
 WHERE <test for blue>
    OR <test for heavier than one pound>
    OR <test for long sleeves>
    OR <test for comes with gift wrapping>
 GROUP BY productid
HAVING count(propertyid) = 4

The key is that the number you test against in the HAVING clause must match the number of properties you are testing against.
RH
Friday, October 07, 2005
 
 
Thanks so much for all your help. The suggestions are good, I am now confident about the data model and I see that I was to quick to think that this model needed multiple self joins in my queries. I especially love RH's SQL query, that's really beautiful. Thanks a lot, I wish I could have thought about this myself.
Charlotte
Sunday, October 09, 2005
 
 
Ask Tom, one of Oracle's gurus has this to say on the exact same question.

 http://asktom.oracle.com/pls/ask/f?p=4950:8:13207134634888315920::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:10678084117056

(If this doesn't work, got to http://asktom.oracle.com/pls/ask/f?p=4950:1:

and search for generic scema)
Loz
Monday, October 10, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz