The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

SQL query question

I'm a relative SQL novice, and I'm having trouble creating the inverse of a query for a many-to-many relationship.

I have three MySQL tables: "scientists", "specialties", and "xref" (the many-to-many table).

I need to select all the scientists NOT working on Biology.

I can't seem to figure out how to ignore John Doe, if he's listed for both Biology and Chemistry.  I want to ignore ALL records for John Doe (including his listing under Chemistry or Physics) if he's listed under Biology.

Thanks in advance.  If you have any suggestions for a better place to ask this question, that would be useful too. 

I've pasted my original query (which would select all scientists who study Biology) below.  Basically, I want the inverse of this query of the "scientists" table.


SELECT
  scientists.ID, first_name, last_name,
  specialties.ID, specialty,
  xref.scientist_ID, xref.specialty_ID, xref.sort_order
FROM
  scientists,
  specialties,
  xref
WHERE
  specialties.ID = xref.specialty_ID AND
  scientist.ID = xref.scientist_ID AND
  specialty = "Biology"
ORDER BY
  xref.sort_order
EAW Send private email
Thursday, January 13, 2005
 
 
One way would be to add " and .... not in (select ... ) to the where clause.
Yack
Thursday, January 13, 2005
 
 
I realize that might not have been entirely clear, so here's a for-instance.

SCIENTISTS
1 John Doe
2 Jane Smith
3 Bill Brown

SPECIALTIES
1 Biology
2 Chemisty
3 Physics

XREF
ID  Scientist_ID  Specialty_ID
1    1    1 (John Doe - Bio)
2    2    1 (Jane Smith - Bio)
3    1    2 (John Doe - Chem)
4    2    2 (Jane Smith - Chem)
5    3    2 (Bill Brown - Chem)
6    3    3 (Bill Brown - Phys)


With the query above, I could get all scientists for a given specialty (e.g., retrieve John Doe for "Biology").  What query would return, if given "Biology", the listing for Bill Brown (the people not listed with "Biology")? 

Thanks.
EAW Send private email
Thursday, January 13, 2005
 
 
Yack,

You mean

AND NOT IN (original query here as a subquery)

?
EAW Send private email
Thursday, January 13, 2005
 
 
I mean

and scientists_id not in (select scientists_id from ...)

... above would be similar to you original query.
Yack
Thursday, January 13, 2005
 
 
Yes, except I think you need to have original query return only ID's in that case, right?  Something like:

SELECT
  scientists.ID, first_name, last_name,
FROM
  scientists
WHERE
  scientist.ID not in (
  SELECT
    scientists.ID
  FROM
    scientists,
    specialties,
    xref
  WHERE
    specialties.ID = xref.specialty_ID AND
    scientist.ID = xref.scientist_ID AND
    specialty = "Biology")
ORDER BY
  last_name, first_name
Herbert Sitz Send private email
Thursday, January 13, 2005
 
 
Whoops, I see Yack already caught that. . .
Herbert Sitz Send private email
Thursday, January 13, 2005
 
 
thanks for the tips - I'm working on it now, and I'll let you know what happens
EAW Send private email
Thursday, January 13, 2005
 
 
possible problem: I believe I'm dealing with a MySQL version < 4.1, which doesn't support subqueries (I'm checking with the host as we speak, but it doesn't look good). 

From my searches on the web, some people say that subqueries can be emulated with clever JOINs. 

Any ideas on how to do this without a subquery?

(P.S.  Thank you very much for the advice before - now it makes perfect sense, and I really wish I could use it)
EAW Send private email
Thursday, January 13, 2005
 
 
another update: 

Here's an alternate version of my first query using a JOIN.  I still don't know how to invert it on the Scientists table (without a subquery), but maybe someone has an idea.

SELECT scientists.first_name, scientists.last_name
FROM scientists, specialties
INNER JOIN xref
ON     (specialties.ID = specialty_ID) AND
    (scientists.ID = scientist_ID) AND
    (specialty = "Biology")
EAW Send private email
Thursday, January 13, 2005
 
 
Something like this might work:

  SELECT
    scientists.ID, first_name, last_name,
    sum(case when speciality = 'biology' then 0 else 1 end),
    sum(case when speciality = 'biology' then 1 else 0 end) 
  FROM
    scientists,
    specialties,
    xref
  WHERE
    specialties.ID = xref.specialty_ID AND
    scientist.ID = xref.scientist_ID AND
    specialty = "Biology")
  group BY scientists.ID, first_name, last_name
  having sum(case when speciality = 'biology' then 0 else 1 end) > 0 and
    sum(case when speciality = 'biology' then 1 else 0 end) = 0

Another possibility is to join the table to itself using an alias.  Would take some thoght to get that one to work..
Yack
Thursday, January 13, 2005
 
 
SELECT
  scientists.ID, first_name, last_name,
  specialties.ID, specialty,
  xref.scientist_ID, xref.specialty_ID, xref.sort_order
FROM
  scientists,
  specialties,
  xref
WHERE
  specialties.ID = xref.specialty_ID AND
  scientist.ID = xref.scientist_ID AND
  scientists.ID NOT IN
    (SELECT
        scientists.ID
    FROM
        scientists,
        specialties,
        xref
    WHERE
        specialties.ID = xref.specialty_ID AND
        scientist.ID = xref.scientist_ID AND
        specialty = "Biology")
ORDER BY
  xref.sort_order
Dino
Thursday, January 13, 2005
 
 
This works on SQL Server (different tables - I have translated it back to your jargon) if there are no duplicates in xref:

SELECT
  scientists.ID, first_name, last_name,
  specialties.ID, specialty,
  xref.scientist_ID, xref.specialty_ID, xref.sort_order
FROM
  scientists,
  specialties,
  xref
left outer join xref x2 on x2.scientist_ID = scientists.ID and x2.speciality_ID = "Biology"
WHERE
  specialties.ID = xref.specialty_ID AND
  scientist.ID = xref.scientist_ID AND
  x2.scientist_ID is null
and 
ORDER BY
  xref.sort_order
DifferentPoster
Thursday, January 13, 2005
 
 
Sorry, misplaced the last 'and' in translation.

(I am currently researching the subtle differences between MACALLAN 12 year and REDBREAST 12 year.:-)
DifferentPoster
Thursday, January 13, 2005
 
 
Thanks to everyone for the help.  I have it working the "lowtech" way of (1) selecting the people who ARE working on Biology, (2) adding all their IDs to an array, (3) using a second SELECT query to pick out the people not in the array. 

Since I have to make the positive selection anyway, filling an array to make a second query for the negative selection is trivial. 

I still want to figure out Differnt Poster's solution for the  logical elegance of it.

Thanks again.
EAW Send private email
Thursday, January 13, 2005
 
 
DifferentPoster nailed it...
Yack
Thursday, January 13, 2005
 
 
I love the Macallan -- I've never heard of Redbreast -- tell us what it's like.
Boofus McGoofus Send private email
Thursday, January 13, 2005
 
 
RedBreast is Irish.

The differences aren't really very subtle.  They are both good, and I may like Macallan better.

(Need more research :-)
DifferentPoster
Friday, January 14, 2005
 
 
How about ...

SELECT
    scientists.ID,
    scientists.first_name,
    scientists.last_name
  FROM
    scientists,
    specialties,
    xref
  WHERE
    specialties.ID = xref.specialty_ID AND
    scientist.ID = xref.scientist_ID AND
    specialty = "Biology"
MINUS
SELECT
    scientists.ID,
    scientists.first_name,
    scientists.last_name
  FROM
    scientists,
    specialties,
    xref
  WHERE
    specialties.ID = xref.specialty_ID AND
    scientist.ID = xref.scientist_ID AND
    specialty != "Biology"

... or some variation on it.
David Aldridge Send private email
Friday, January 14, 2005
 
 
Check out temporary tables. That's how I'd do it.
bof.
Sunday, January 23, 2005
 
 
A smaller variation of Herbert's:

SELECT
  scientists.ID, first_name, last_name,
FROM
  scientists
WHERE
  scientist.ID not in (
  SELECT
    xref.scientist_ID
  FROM
    xref,
    specialties
  WHERE
    specialties.ID = xref.specialty_ID AND
    specialties.specialty = "Biology")
ORDER BY
  last_name, first_name

Just eliminated a table from the subquery. It seemed redundant. Probably wouldn't have affected performance, though.
BradC Send private email
Monday, January 24, 2005
 
 
I read in a very good Book on SQL many years ago (by Mark Gurry I think) always replace a NOT IN clause by the equivalent NOT EXISTS.

Imagine doing the work behind the previous statement, For every scientist you are retrieving the whole list of scientist ID's who are doing biology. Then searching the entire list to make sure the scientist doesn't occur. This can be very slow on large tables.

Better to use the keys on the table and search for any occurrence of biology for a particular scientist. This will work much faster for larger sets of data becausen it looks up throught the ref table and various keys to see if this scientist specialises in biology.

i.e.

SELECT
  scientists.ID, first_name, last_name,
FROM
  scientists
WHERE NOT EXISTS (
  SELECT 'x'   
  FROM
    xref,
    specialties
  WHERE
    xref.scientist_ID = scientist.ID AND
    specialties.ID = xref.specialty_ID AND
    specialties.specialty = "Biology")
ORDER BY
  last_name, first_name
James Briggs Send private email
Monday, January 31, 2005
 
 
Thx James.
Dino
Monday, January 31, 2005
 
 
I read in a very good Book on SQL many years ago (by Mark Gurry I think) always replace a NOT IN clause by the equivalent NOT EXISTS.

Can't have been that good a book if it recommends doing that. You need to understand how the 2 work. It depends on how many records you have in each side of the join on which is a better option. Here's why...

With NOT EXISTS it evaluates the query for each row of the table your using.

With NOT IN it builds a list of the records from the comparison table and then keeps checking against that.

So if your query was checking 10,000 rows in a table against a record set of just 40 with NOT EXISTS its running the sub query 10,000 times.

Using NOT IN it runs the sub query just once then references the result set each time.

Take the reverse 40 against 10,000 its likely to be quicker to use NOT EXISTS.

Here's a great article with evidence

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:953229842074
gilf Send private email
Friday, February 04, 2005
 
 
Hi Joel,

Thanks for the article on user interfaces, really useful!

If you're concerned with speed, you might find this runs a bit faster:

SELECT
  scientists.ID, first_name, last_name,
FROM
  scientists, xref, specialities
WHERE
  xref.scientist_ID = scientist.ID AND
  specialties.ID = xref.specialty_ID
GROUP BY
  scientists.ID
HAVING
  sum(
    CASE WHEN speciality = "Biology" THEN 1 ELSE 0 END
  ) = 0
James Send private email
Saturday, February 05, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz