The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Data Modelling - Searching For Public Entities

I have a table in my data model that will store the primary entities in my application. These entities can have a simple security level assigned to them i.e. Private/Friend/Public, so there will be a foreign key to some form of security level table.

There will likely be a high volume of searches conducted against the public primary entities (think of searching everyone's photos in Flickr). Do you think it would be more performant when searching to eliminate the foreign keys by storing all the public entities in a separate table to the private ones?

Or, doesn't it really matter anyway because (assuming correct indexing) you can do something like SELECT * FROM primary_entities WHERE security_level = 1; (where security_level 1 corresponds to Public)?

Friday, August 25, 2006
Ah. Good question.

I was reading last night and came across something very similar.

The question you need to answer is: "Where in the design process are you at?"

Four basic steps:

1. Business requirements: Sitting down with the stakeholders and determining the rough estimate of what they are storing

2. Conceptual Model: Database platform independent schema design

3. Logical Model: Schema designed with a specific platform in mind, constraining types, etc.

4. Physical Model: Once you have it built and start testing, this is where the production side (ie, the DBA's) will start tweaking to get the needed performance.

Any changes to the schema that are driven forward in the name of performance should be really looked into. Not that it can't happen, but there is alot of stuff that can be done to affect the database performance without changing the model.

What you're proposing can be done, but I've only seen it done rarely and it was on very, very, very large tables.

Your query example would work fine.

When you are designing your interface to the tables (stored procs or views (views? yuck.)) you want to make sure that you properly abstract.

What I mean to say is, lets say your flickr clone site uses a stored proc with the code you posted.

If you had to segment and split the table up (one public, one private) then you would have to change the proc.

Which is indefinately better than searching for strings in the website code that select from that table.

To sum it up, properly normalize your tables and make decisions when you are doing the performance testing if they are needed.

No preemptive optimization. =)
D in PHX Send private email
Friday, August 25, 2006
>> Or, doesn't it really matter anyway because (assuming correct indexing) you can do something like SELECT * FROM primary_entities WHERE security_level = 1; (where security_level 1 corresponds to Public)?

Using an index for something like that is going to be rather expensive, if you are looking for a significant proportion of rows in the table. In terms of physical design in Oracle you might consider using a table partitioned on security_level if most of your selects against that table include a security_level predicate. Partitioning is an expensive option though, and for a cut-price solution you might implement what Oracle used to call a "Partion View" -- three tables with appropriate constraints and a single UNION ALL view over the top of them to make them a single logical entity.
David Aldridge Send private email
Friday, August 25, 2006
Actually, there is a general rule of thumb, which is where the design phase comes in.

In this scenario, every time you hit the public entities, you're also going to have to hit the security table to determine if you have access to it. In a fully normalized, "by the book" database, you're going to have to do a join operation every single time. This can be expensive in very high volume sites and it can make things more difficult for the programmers when they start doing nested queries and additional joins (especially outer joins).

Since you have the luxury of designing things up front, you do want to at least see if you can eliminate that initial join. Off the top of my head, there are a couple of options...

1) Cache the foreign keys in the application, or otherwise code the application to look for values of 1, 2 or 3 instead of resolving them and comparing against PRIVATE, FRIEND and PUBLIC.

2) Skip the separate table and put PRIVATE, FRIEND and PUBLIC directly into the main entity table. Yes, this is "unnormalizing" the table, but if the possible values are relatively static, then you'll still pay very little in terms of maintenance costs when correcting or updating the data.

As an example of #2, it's extremely unlikely that there will be more than 50 states or any state will change its name.  So instead of having a foreign key to a state table, it can be more beneficial to just have the application limit the possible values (via dropdown list or validation).

I agree with D in PHX though in that now is the best time to make decisions like this, and be consistent with whatever you decide. I would not however wait til you're ready to do performance tuning to decide if you really want two separate tables.
Friday, August 25, 2006
Thanks everyone. I think TheDavid's option 2 actually makes the most sense in the context of what I'm doing. The available security levels are never going to change.
Friday, August 25, 2006
Ah, yeah if they won't change that is good.

Sometimes you have to focus on generalizations until you have specifics. =)
D in PHX Send private email
Friday, August 25, 2006
I'm thinking about this very issue for my own application right now, so I appreciate the discussion. I'm considering the question more generally from the viewpoint of lookup tables, especially those that contain relatively static values that lend themselves to enumerated code structures.

By default I tend to normalize everything and store lookups in their own tables (usually "SomeType"), but when we're only talking about a few values this seems like overkill.
Friday, August 25, 2006
Only overkill if you are not going to be adding new ones.
D in PHX Send private email
Friday, August 25, 2006
"The available security levels are never going to change."

Except that now you've said that I can guarantee you they will - 'but what happens if.... oh we could solve that by having an extra security level'

Come back here in 5 years and tell me I'm wrong.

Put them in a separate table.
Database purist
Tuesday, August 29, 2006
"Come back here in 5 years and tell me I'm wrong."

You're wrong.

1) Search and Replace will fix existing ones.

2) New values can be added at will.

I know where you are going with this. Let's say I had a million people in my database, and some of those people live in the same household. To make the example more "real world", let's say I'm a health insurance company and I have a separate record for every individual.

Let's say Poppa Bear signs his family up and makes a typo on his form; he lives on Elk Street in Appleville rather than Elm Street in Mapleville. Yes, in this scenario, the separate table will save your ass.

But any time you have a table that consists of two or fewer columns (the primary key id and the value), you are just wasting space, time, and patience. Always combine such tables with their master tables.
Wednesday, August 30, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz