The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

SQL query puzzle

I need a little help implementing a couple tables and/or queries.  I have searched, and I can't figure out how to do this.

I have a Purchase table, which looks like this (rows not needed for this example skipped):

Purchase
--------
PurchaseID
CustomerID
ItemID
Status
Cost
PurchaseDate

Items can be purchased, cancelled, deleted, waitlisted or in the web shopping cart (not paid for yet).  I also have a PurchaseHistory table, which has one record for each change in the status of a purchase

PurchaseHistory
---------------
PurchaseHistoryID
PurchaseID
HistoryDate
Status
UserID
Cost

So for an example, suppose a customer purchases a $10 item, and then later cancels that purchase.  He would have one purchase record, which points to the item, and two history records.  The first history record has the initial purchase ($10), the second has the cancellation (-$10 back to his account).  This way I can keep track of the kinds of changes, when those changes were made, and who made them.

In the purchase table, I have Cost, Status and PurchaseDate fields, which get updated whenever a new history item is added.  It's not normalized, and fairly brittle.  We had another developer who updated a bunch of purchase tables creating new history tables, and it boogered the financial reports.

The unnormalized fields are there because I can't come up with a clean way to associate only the most recent history item with e purchase.

So for instance, if I want to query for a list of everyone who is currently buying a particular item, I can do something like:

SELECT * FROM Customer
INNER JOIN Purchase ON Purchase.CustomerID=Customer.CustomerID
WHERE Purchase.ItemID = 100 AND Purchase.StatusID=1

That works, as long as Purchase.StatusID matches the most recent History record.  Is there any way to skip the unnormalized fields and join only the most recent history item for each purchase?

I tried to do something like
 
SELECT * FROM Customer
INNER JOIN Purchase ON Purchase.CustomerID=Customer.CustomerID
INNER JOIN (SELECT TOP 1 PurchaseID, Status
  FROM PurchaseHistory ORDER BY HistoryDate) History ON History.PurchaseID = Purchase.PurchaseID
WHERE Purchase.ItemID = 100 AND History.StatusID=1

but that does the top 1 of the whole history table, which is not correct.

Is there any way to make the subquery return the top 1 record for each PurchaseID?  Or is this something I'm going to have to just do in the appliation code?
ISV owner phase one - consulting with former employer Send private email
Thursday, October 12, 2006
 
 
It depends on the database, but typically, nested select statements are done independently of the other, external statements first. So the table...

(SELECT TOP 1 PurchaseID, Status
  FROM PurchaseHistory ORDER BY HistoryDate)

...is generated in its entirely and saved to the history alias, before you link up the purchase id.

I don't know if this is the most efficient way of doing it (from a performance standpoint) but I would create the desired purchase records in their entirely before matching up with the customer data, so it would look something like...

SELECT *
FROM Customer c INNER JOIN (
  SELECT TOP 1 p.PurchaseID, h.Status
  FROM Purchase p, Purchase History h
  WHERE p.PurchaseID = h.PurchaseID
  AND p.CustomerID = c.CustomerID
  AND p.ItemID = 100
  AND h.Status = 1
  ORDER BY HistoryDate
)
TheDavid
Thursday, October 12, 2006
 
 
Erg...

After the INNER JOIN closes, of course add the ON clause and any desired WHERE, GROUP BY, HAVING and ORDER BY clauses.
TheDavid
Thursday, October 12, 2006
 
 
Structurally you could add a column on the Purchase table to the PurchaseHistory table which tracks the most current history record. You could put an insert trigger on the PurchaseHsitory table to update the corresponding Purchase record with the primary key of the new history record.
KL
Thursday, October 12, 2006
 
 
This is a very common scenario, and there are several design patterns for it. You're almost there, but I would suggest a few changes.

Your Purchase table is the table that should be updated directly from your code. Incidentally, the "PurchaseId" column in the table should conventionally be called "Id", and be a surrogate key, for example an identity column.

Then create the PurchaseHistory table as an exact copy of the History table, with just 3 changes:

1) Its own "Id" column, this being the surrogate key for the PurchaseHistory table. You already have this, although with an unconventional name.
2) The Purchase.Id column should be represented as "PurchaseId". You've already done this.
3) A "HistoryDate" column, representing the exact date/time of the insert/update. It looks like you already have this.

Now make a trigger on the Purchase table to insert a new row in the PurchaseHistory table for each insert/update.

Finally, and assuming SQL Server, an example of an SQL query to list all customers currently purchasing any item looks like this:

SELECT CU.Name, IT.Name, PH.HistoryDate
FROM  Customer CU
INNER JOIN PurchaseHistory PH ON PH.CustomerId = CU.Id
                            AND PH.HistoryDate = 
                  (SELECT MAX(PH2.HistoryDate)
                    FROM  PurchaseHistory PH2
                    WHERE  PH2.CustomerId = PH.customerId
                    AND    PH2.ItemId = PH.ItemId)
INNER JOIN Item IT ON IT.Id = PH.ItemId        
INNER JOIN Status ST ON ST.Id = PH.StatusId
                    AND ST.Code = 'PURCHASE_IN_PROGRESS'

The correlated sub-query allows you to extract just the latest action for a specific customer/item. The join to the Status table enables you to control the type of customer actions that you want to see.

NB You'll need square brackets around some of these column names as they're reserved words. And you may want an index on the HistoryDate column to improve performance.
Mark Pearce Send private email
Thursday, October 12, 2006
 
 
Sorry, SQL query corrected and simplified:

SELECT CU.Name, IT.Name, PH.HistoryDate
FROM  Customer CU
INNER JOIN PurchaseHistory PH ON PH.CustomerId = CU.Id
                            AND PH.HistoryDate =
                  (SELECT MAX(PH2.HistoryDate)
                    FROM  PurchaseHistory PH2
                    WHERE  PH2.PurchaseId = PH.PurchaseId)
INNER JOIN Item IT ON IT.Id = PH.ItemId       
INNER JOIN Status ST ON ST.Id = PH.StatusId
                    AND ST.Code = 'PURCHASE_IN_PROGRESS'
Mark Pearce Send private email
Friday, October 13, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz