The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database design for scheduling problem

My application will handle scheduling for a particular business segment.  The primary “item” processed by the app needs to record schedule data and actual data.  The schedule data is a slight superset of the actual data, containing 4 or 5 more datetime values. 

Scheduled items may not occur, and actual items may not have been scheduled.  The app will allow comparison of schedule and actual data where a link can be established.  Screen displays will substitute actual data for schedule data when linked and available.

My question is about designing the database to handle these items.  I’m concerned about query performance (data set *could* grow large) and ease of understanding by technical customers (schema will be published so they can write their own reports).

The options that come to mind are:
1)  Store the schedule and actual data in the same table in a single record.
2)  Store the schedule and actual data in the same table in separate records, using a relationship between records.
3)  Store the schedule and actual data in separate tables, using a relationship between tables.

I’m leaning towards option 3, but I’d appreciate any advice.  Thanks.
Brandon K
Tuesday, December 06, 2005
If I read this right, each "event" may be scheduled or not.  If it IS scheduled, then it has an expected date it should have happened on, and an actual date it did happen on, if it happened at all.

In addition, it could have additional 'expected_date' values, if the original one was re-scheduled.

So, this sounds like a classic one-to-many relationship.  An event could have:
1.  Only an actual date, if it occurred without being scheduled.
2.  Only an expected date, if it was scheduled but then canceled.
3.  One or more expected dates, if it was scheduled then rescheduled, and ultimately canceled.
4.  One or more expected dates, and an actual date, if it was scheduled and ultimately held.

So, I'd have an 'event' record which described the event -- title, people present, etc.  Then have a "ScheduleItem" table, which would have a Foreign Key pointing to an Event record, and then one record each for each expected_date item, and then one record for the actual_date item.

Probably the ScheduleItem would also contain the date when the item itself was created.  A simple query against the "ScheduleItem" table, given an "Event" key value, would allow you to get all the Event's expected_date and actual_date items.
Tuesday, December 06, 2005

Thanks for your reply.  Your description of an event is correct, except for the event having more than one expected date.  The event itself consists of multiple schedule items (e.g. A, B, C, D).  The schedule would track all of them, but the actual record may only care about B-D.  Audit history is recorded, but from the end-user’s prospective an event has only one expected date (A’s start time).

If I understand your last two paragraphs, you’re advocating option #2, except storing the “schedule item” data (start time, end time, and length) separate from the event itself:

  EventID - PK
  [Event-specific values]
  ExpectedAScheduleItemID - FK
  ExpectedBScheduleItemID - FK
  ExpectedCScheduleItemID - FK
  ExpectedDScheduleItemID - FK
  ActualBScheduleItemID - FK
  ActualCScheduleItemID - FK
  ActualDScheduleItemID - FK

  ScheduleItemID - PK
  StartTime - datetime
  EndTime - datetime
  Duration - datetime

Did I read your post right?
Brandon K
Tuesday, December 06, 2005
definitely separate data into tables...

something like:

1) Item(PK: Id, .."the data"...)
2) ScheduledItem(PK: Id, FK: ItemId, ScheduledDateTime)
3) ScheduledData(FK: ScheduleId, .."the data"...)
Johncharles Send private email
Tuesday, December 06, 2005
Well, I was using "Event" as a generic "something happens here" descriptor.

So, you'll need to clarify your terminology.

If a "ScheduledItem" has one or more "SubItems", and each "SubItem" must be scheduled, then I would tend to do the following:

  Item_ID : PK
  Item_Description: String
  Item_More_Data : Whatever

  SubItem_ID : PK
  ParentItem : FK  -- Note each "Item" above will have
multiple SubItem records
  SubItem_Description : String

  SchedRecID : PK
  ParentItem : FK
  ParentSubItem : FK  -- Again, multiple records possible
  Expected_Actual_Flag : INTEGER
  RecordDate : DATETIME  ' Time entry was made
  ItemDate : DATETIME

With this, now you can do quieries like:

Select * From SchedRecTable
WHERE ParentItem = DesiredEvent.Item_ID
Tuesday, December 06, 2005
==>I’m concerned about query performance (data set *could* grow large)

If that's a concern, you're using the wrong hardware, software, or a combination of both. I've got databases that join a table with tens of millions of records to a table or three with hundreds of millions of records, and maintains sub-second response times for end user queries.
Your more important issue should be a good, solid database schema that models your reality. Proper database software and hardware (and setup/configuration thereof) should make this a non-issue.

==>and ease of understanding by technical customers (schema will be published so they can write their own reports).

Ain't this *always* the impossible dream. It's requested on *every* database project I've ever worked on. It all comes down to "this stuff is *hard*" and the end-users give up. Always requested, never works out.
Tuesday, December 06, 2005
The secret of scheduling is to be able to work backwards.

Earliest start time.(dependant)
Latest start time (derived).

Earliest end time. (dependant)
Latest end time. (dependant)

Duration.(In nano-fortnights)

Throw in resource levelling and your ready for white hair.
John Griffiths Send private email
Tuesday, December 06, 2005
I think I've got a plan to proceed.  Thanks for your help!
Brandon K
Wednesday, December 07, 2005
The confusion seems to be your use of the word superset.

Could you express the facts in a form something like.

Each Schedule contains a date it starts on.
Each Schedule may contain more than one possible start date
Each Schedule has only one date that it actually starts on.

If the above is true (and I might be completely wrong), then that maps to a table called Schedule which has all the parent constant information for a schedule on one row and a table called Startdates where each row is the Schedule ID and the date it may start on along with a logical field called Started.

If I did misunderstand completely then try and use statements like that above to refine the actual problem
Simon Lucy Send private email
Wednesday, December 07, 2005
Looking back at my original post I definitely oversimplified things; sorry about that.  And on the 4th rewrite of my reply to Allan to better define the problem, the solution became obvious.  Thank you both for reading and replying.
Brandon K
Thursday, December 08, 2005
Create some obvious views for the business-users. I'd be nervous having them query schemas directly.
Matt B Send private email
Thursday, December 08, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz