The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

SQL howto question (grouping and summarising)

Trying to do a simple query to summarise some data for use in a graph/report but my SQL skills seem to be failing me this week.

Given a table, lets call it 'stuff':

id, grp, qty
------------
0, a, 0
1, b, 8
2, b, 0
3, a, 3
4, a, 2
5, a, 0
6, c, 0
7, d, 5
8, c, 0
9, d, 4

Im trying to get a summary showing number of rows in each grp and for each of the grp how many had zero qty, and how many had some. Ie:

grp, grpCt, hasQ, noQ
---------------------
a, 4, 2, 2
b, 2, 1, 1
c, 2, 0, 2
d, 2, 2, 0

Getting the first two columns is trivial:

SELECT grp, COUNT(grp) AS grpCt FROM stuff GROUP BY grp

Counting the number where qty > 0 or not and including that in the results is proving less easy than I thought it would be. I still feel like it should be a simple statement given that only one table is involved. Alas the closest I came to getting a result was this nightmarish monstrosity (looking like something from the daily wtf):

SELECT grp, grpCt, hasQ, noQ FROM
(
  (SELECT grp, (hasQ + noQ) AS grpCt, hasQ, noQ FROM
    (
      SELECT grp, COUNT(grp) AS hasQ from(SELECT * FROM stuff WHERE qty > 0) GROUP BY grp
    ) AS foo
    LEFT JOIN
    (
      SELECT grp, COUNT(grp) AS noQ from(SELECT * FROM stuff WHERE qty = 0) GROUP BY grp
    ) AS bar
    ON foo.grp = bar.grp)
  UNION
  (SELECT grp, (hasQ + noQ) AS grpCt, hasQ, noQ FROM
    (
      SELECT grp, COUNT(grp) AS hasQ from(SELECT * FROM stuff WHERE qty > 0) GROUP BY grp
    ) AS foo
    RIGHT JOIN
    (
      SELECT grp, COUNT(grp) AS noQ from(SELECT * FROM stuff WHERE qty = 0) GROUP BY grp
    ) AS bar
    ON foo.grp = bar.grp)
)

Yipes. And it doesn't even work either, returning:

grp, grpCt, hasQ, noQ
---------------------
a, 4, 2, 2
b, 2, 1, 1
c, null, null, 2
d, null, 2, null

Obviously a wild goose chase...

But what is the proper way to do it?
AH
Thursday, January 11, 2007
 
 
There are probably several different ways.  If your db supports embedded subqueries you could use something like this:

SELECT s.grp, COUNT(s.grp) AS grpCt,
(select count(*) from stuff s2 where s2.grp = s.grp and s2.qty = 0) as zerocount,
(select count(*) from stuff s2 where s2.grp = s.grp and s2.qty > 0) as gtr_zero_count
FROM stuff s GROUP BY s.grp
Herbert Sitz Send private email
Thursday, January 11, 2007
 
 
This version avoids subqueries and is probably preferable:

SELECT s.grp, COUNT(s.grp) AS grpCt,
sum(case when s.qty=0 then 1 else 0 end) as zerocount,
sum(case when s.qty>0 then 1 else 0 end) as gtr_zero_count
FROM stuff s GROUP BY s.grp
Herbert Sitz Send private email
Thursday, January 11, 2007
 
 
Many thanks Herbert! Both worked like a charm :-)
(Running on HSQLDB 1.8.0_2)

What are the performance implications of subqueries (speaking in general rather then just in HSQLDB) in a query like the first example? Would the db engine have to do an extra query for every group in the main query?
AH
Friday, January 12, 2007
 
 
Yes, that's it, in the subquery version there are actually two extra queries run for every row in the result set.  The second version without subqueries may not slow things down perceptibly at all, since the original query was already scanning through the grouped set to count group totals.  It will just do its "count" (i.e., Sum(1)) of records with 0 and records greater than 0 in same pass.
Herbert Sitz Send private email
Friday, January 12, 2007
 
 
With large datasets subqueries are extremely inefficient. Herbert's is the correct solution.
Miranda Send private email
Friday, January 12, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz