The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

mySQL field list subqueries

I'm writing a mySQL query for a timetabling type app, where there are resources, and weekly bookings for them.  There can also be more than one booking for a week, where the second booking is only used if the first one is deleted later.

My first thought was to do something like:

SELECT * FROM resources LEFT JOIN bookings ON resource.id = bookings.id

however if a resource has more than one booking,  then that resource turns up twice in the result of course, whereas I only want the first booking.  So I ended up with:

SELECT *, (SELECT id FROM resources WHERE resource.id = bookings.id ORDER BY resource.id LIMIT 0,1)
FROM resources

This works, and returns the resources and the first booking for each in a week.  However, for each field I want from the bookings table I have to add another field list subquery, and I actually need to do this across a number of weeks at once, so I end up with 20+ field list subqueries, and it's SLOW! 

What's a better way to write this query?  Is there some way to return multiple values from a field list subquery?
drowning in subqueries
Tuesday, June 28, 2005
 
 
Rather than approach the problem from the POV of the query, I feel that the schema needs to be changed. There appears to be nothing in the joins which would indicate which booking is actually the first - apart from the ordering that happens to jump out of the subquery, which is not something that should be relied upon. Some data seems to be missing here.

I would add a field for Precedence which would be part of an appropriate primary key on bookings (Resource ID, Week, Precedence). This adds more work to table maintenance (we have to maintain Precedence whenever inserting/updating/deleting bookings) but this solves the complexity of the query in question and ensures you are capturing everything explicitly. Just specify Precedence = 1 in the WHERE clause and this would only return the "first" booking.

SELECT * FROM resources
LEFT JOIN bookings ON
resource.resourceid = bookings.resourceid
AND bookings.precedence = 1

You can also add a clause to restrict which weeks are of interest.

(Precedence could be stored in another table instead, but I don't know if that is really necessary)
Joel Goodwin Send private email
Tuesday, June 28, 2005
 
 
Why not:

SELECT distinct(resource_id) FROM resources LEFT JOIN bookings ON resource.id = bookings.id

Then you're sure to have each resource exactly once.  Then you can use your DAO (Data Access Object) to load them if needed.  It becomes two steps, but should be much better in terms of memory, execution time, etc.
KC Send private email
Tuesday, June 28, 2005
 
 
KC, the OP wanted to get the booking information with the resources not just which resources are booked.
Joel Goodwin Send private email
Tuesday, June 28, 2005
 
 
Joel:

the "ORDER BY resource.id" gives the order to the bookings, where resource.id is an auto_increment field, so the first booking placed comes up first, and the limit 0,1 ensures only one row is returned.  So  the ordering from the subquery does give the right result.

I guess I could store the precidence of each booking with the booking, and recalculate with each insert/update/delete, but I'd rather not.  btw, my example is a greatly simplified form of the actual data.

KC:

You mean I could just get the resources, then do a query for each resource to find it's bookings?  I've tried this though, and as most resources have bookings, you do end up doing a query for each resource, and it worked out slower than all as one query.  Though you do get more cache hits than with one big query.
drowning in subqueries
Tuesday, June 28, 2005
 
 
Drowning, could you clarify the role of resource.id? Does it represent a "booking" or a "resource"? I'm confused as to why a new resource.id would be generated for each new booking, as that would suggest the same "resource in the real world" would have multiple resource.id values.
Joel Goodwin Send private email
Tuesday, June 28, 2005
 
 
Joel:

Sorry, you're right of course. i really messsed those queries up trying to simplify them for discussion, they should be:

SELECT * FROM resources LEFT JOIN bookings ON resources.resourceid = bookings.resourceid

SELECT *, (SELECT bookingid FROM resources
    WHERE resources.resourceid = bookings.resourceid
    ORDER BY bookings.bookingid LIMIT 0,1)
FROM resources

hope that makes more sense now.
drowning in subqueries
Tuesday, June 28, 2005
 
 
Drowning, you need to find the maximum booking ID per resource and per week. I think this would work, if not this then something similar. (the "week" clause is the one that not 100% about)

SELECT * FROM resources R
LEFT JOIN bookings B ON R.resourceid = B.resourceid
AND B.bookingid = (SELECT MAX(BX.bookingid) FROM bookings BX WHERE BX.resourceid = R.resourceid AND BX.week = B.week)

I believe the latest version of MySQL can do this sort of subquery.
Joel Goodwin Send private email
Tuesday, June 28, 2005
 
 
Or, perhaps, I meant the MIN of the bookingid =)
Joel Goodwin Send private email
Tuesday, June 28, 2005
 
 
Joel:

Thanks, that's what I was looking for. I rewrote in that form, with the subquery in the where/join clause.  It's much cleaner, but actually worked out to be quite a bit slower!  :(  Everything's indexed etc, so I guess it's time to work on the system tuning a bit more, or throw more horsepower at it!
drowning in subqueries
Wednesday, June 29, 2005
 
 
Drowning,

I can't help you too much on the performance (this doesn't work too badly on something like SQL Server or Sybase) but there is an alternative way of formulating the query.

You could try something like:

SELECT * FROM resources R
LEFT JOIN bookings B ON R.resourceid = B.resourceid
AND B.bookingid IN (SELECT MAX(BX.bookingid) FROM bookings BX WHERE GROUP BY BX.resourceid, BX.week)

i.e. use a subquery to find all the "max booking ids" instead of having the subquery be dependent on the resourceid.
Joel Goodwin Send private email
Wednesday, June 29, 2005
 
 
Sorry there's an errant WHERE in the subquery which should be removed, or used to filter out weeks/resourceid's you might not be interested in.
Joel Goodwin Send private email
Wednesday, June 29, 2005
 
 
And I said MAX instead of MIN again...
Joel Goodwin Send private email
Wednesday, June 29, 2005
 
 
It is almost always faster to do the subquery yourself than let mysql do it.

Instead of:
SELECT * FROM resources R
LEFT JOIN bookings B ON R.resourceid = B.resourceid
AND B.bookingid = (SELECT MAX(BX.bookingid) FROM bookings BX WHERE BX.resourceid = R.resourceid AND BX.week = B.week)

Do:
1. SELECT MAX(BX.bookingid) FROM bookings BX WHERE BX.resourceid = R.resourceid AND BX.week = B.week

2. SELECT * FROM resources R
LEFT JOIN bookings B ON R.resourceid = B.resourceid
AND B.bookingid = (_the results from part 1_)


When can I get away from this MySQL hell??!!! You might also want to try PostGreSQL.

-Andrew
Team Bob Send private email
Thursday, June 30, 2005
 
 
Joel:
Thanks, you've given me some good ideas for playing around with restructing the query.

Andrew:
I'm thinking spitting it up into a few queries might be the way forward.  From my understanding, mySQL doesn't cache the subqueries in a query seperately, only the whole query is cached, so if one booking is added then the cached copy of my one big query is pruned.  With a few extra queries, you'll hit the cache more often, and then the speed of the raw query isn't such an issue.  Hopefully!

Thursday, June 30, 2005
 
 
Ok, have this >2x faster now, creating a table to get the bookingids for each resouce, and then doing a join against that and using field list subqueries to get the details for each booking.  So it's a bit like this:

CREATE TEMPORARY TABLE temp1 as SELECT resouceid,([find bookingid for the 1st week]) as week1, ([find bookingid for 2nd week]) as week2

SELECT resourceid,
(SELECT bookings.description FROM bookings WHERE bookings.bookingid = temp1.week1) as week1description,
(SELECT bookings.description FROM bookings WHERE bookings.bookingid = temp1.week2) as week2description
FROM resources,temp1
WHERE resoucres.resourceid = temp1.resourceid

I'm sure there's some errors in there but hopefully it's understandable...
drowning in subqueries
Thursday, June 30, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz