The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

need some help with SQL and sorting...

Hi all,  need some help here if anyone is able.

I have a CMS which has to generate a page tree structure from a database.  Each record in the database has an id as primary key, a parent (also an index) which indicates which page its a child of, and a navlevel which indicates how far down a branch its located.

At the moment its using a recursive loop which pulls a record and then loops back on itself to get any children of that record and so on.  Its been working well until we've got a site with about 100 or so pages, and we've moved the site to the live server (a shared box) and its pretty much ground to a halt.  It literally takes about 1 minute to generate this list.  I cannot believe we've had such a major performance hit, but I have to fix it very quickly.

This is the code its running at the moment, I've removed the CSS stuff to make it more readable but the rest is there.  Its classic ASP in vbscript.


Sub parentpage(id)

    Set RstObjTop=ConnObj.Execute("SELECT DISTINCT linktext, url, id, cms, parent, editable, navlevel, displayorder FROM sitepage WHERE parent=" & id & " ORDER BY id,displayorder")

    If Not RstObjTop.EOF then
        Do Until RstObjTop.EOF

                <option value="<%=RstObjTop("id")%>"><%=RstObjTop("linktext")%></option>

'-----    call the function again
            parentpage RstObjTop("id")
    End If
End Sub


My other question is obviously, is there a way I could just grab all the pages from the table in one go, then use a sort routine to put them in order?  Can someone suggest a way to do that?  The data looks something like this:

1    parent0
2    parent0
3    parent0
4    parent0
5    parent2
6    parennt2
7    parent3
8    parent4

and it needs to be output like this:

- 5
- 6
- 7
- 8

so it ends up like a navigation tree...

Any help much appreciated
Friday, April 21, 2006
Generally, a DISTINCT statement indicates redundancy that couldn't be filtered a better way.  I know what you are doing... you have multiple edits of a particular page in your database and you want to find the latest one... you ought to think about marking the previous pages deleted when you add a new version... and then remove the DISTINCT statement and 'ORDER BY id'...

It may not be a huge gain... but generally allowing the database to do a 'WHERE isDeleted = 0' is going to be much faster than it doing it's own recursive magic to discard the rows that are redundant.

Above that, you might want to consider an elaborate sub query to determine if a particular page has any sub-pages in the first query of the page... so that an unnecessary query is avoided.
Phillip Zedalis Send private email
Friday, April 21, 2006
I actually don't have multiple copies of the pages in the database, I just have this table with one copy of the pages in, there's 100 pages so I have 100 records.  It just needs to get them out in the right order so I have to so a recursion and call the SELECT multiple times.

I'm putting the sql into a stored procedure at the moment to see if I get a performance boost, but I'm not convinced it will make taht much difference.
Friday, April 21, 2006
I've removed the DISTINCT and it doesn't seem to have increased the speed anyway
Friday, April 21, 2006
Stored Procedure doesn't seem to have helped at all, not really surpising I guess as the SELECT is running so many times rather than returning the same big recordset every time... 

Any ideas how I could best go about grabbing the whole table then sorting the offline recordset?  Using arrays or something?  Would this necessarily be quicker?
Friday, April 21, 2006
What database are you using, some noddy one no doubt that can't handle such a simple task ;) get rid of it an get a proper one like Oracle which can handle it no problem...

SQL> select * from order_test;

        ID    PARENT
---------- ----------
        1          0
        2          0
        3          0
        4          0
        5          2
        6          2
        7          3
        8          4
        9          6

9 rows selected.

SQL> select id, parent, level  from order_test start with parent = 0 connect by prior id = parent;

        ID    PARENT      LEVEL
---------- ---------- ----------
        1          0          1
        2          0          1
        5          2          2
        6          2          2
        9          6          3
        3          0          1
        7          3          2
        4          0          1
        8          4          2

9 rows selected.

I added 9 in to show it works to any depth you fancy. Level is a psuedo columm which tells you which level your at, usefull if you need to do anything graphically.
Andy.. Send private email
Friday, April 21, 2006
There are a couple of methods if you know how many levels you will be using but that won't be flexible.
Andy.. Send private email
Friday, April 21, 2006
Oracle does this out of the box. That doesn't help you, I guess. Are you using SQL Server?

I'm assuming the amount of data will never be more than a few hundred rows. Otherwise it is too much to display in a meaningful way.

Forget about doing it in a Stored Procedure or Function. That's just messy.

I recommend getting all the data with one select statement, then in ASP copy it all to an array. Then use a recursive sub to iterate over the array repeatedly, outputting the info in the hierarchy.

It's not great, but it will perform quickly for one user, and will even work okay with multiple users _unless_ you have tens of thousands of users per day.

If you do have really high traffic, try caching the results in someway.

If the data changes rarely - say once per day - consider updating a bunch of static HTML pages when the data does change and just deliver the HTML pages with the already-built hierarchy.
Imminent Send private email
Friday, April 21, 2006
It is sql server, yes.

I'm going to try the disconnected sort with an array (I'll use GetRows) but I'm not sure of the best way to proceed from there...
Friday, April 21, 2006
I had to do a similar thing in SQL server but with organization levels, a little more complicated because a particular entity could be on different trees and at different levels.

I used stored procs and the performance was fine with thousands of records.
Andy.. Send private email
Friday, April 21, 2006
Thanks all for being a sounding-board... I've converted the code to pull all the data into an array and sorted the array,and the speed difference is incredible!

Was a bit of a panic as I'm the only dev and I'm off on a training course next week, have to get this site live!
Friday, April 21, 2006
A bit off topic but don't always assume that doing something in SQL is quicker than just getting the data out and doing something else with it. Most of the time it is by an order of magnitude but sometimes it's not, sorting is definately an area where I have been able to speed things up by pulling the data out on mass and manipluating it.

For example recently I had a SQL query which was taking 1 1/2 to run, I removed the sorting and moved it into a perl script, the SQL went down to 15-20 minutes and the perl literally took less than a minute to run.
Andy.. Send private email
Friday, April 21, 2006
>I've converted the code to pull all the data into an array and sorted the array,and the speed difference is incredible!

You CAN NOT create a reocrdset in a loop as  your original code was and expect any performance.

Think of creating a reocrdset like using a helicopter to cross the street. Which is faster to cross the street?

a) a helicopter
b) walking on foot to cross the street

Well, if the helicoptor is in mid flight..then it is MUCH faster (in flash as you look up and see the helicoptor fly over!!). However, if the chopper is sitting on the ground..then it is MUCH faster to walk across the street.

For a recordset…an HUGE amount of things happen…

The query engine is loaded, syntax is checked, security and permissions is checked....this list goes on and on for about 1000+ pages of information as to what happens BEFORE THE DATA starts to flow. Once the data starts to flow…it is like the chopper in flight..

To walk across the street, the walk light turns on…and you walk. In the helicopter, you got a HUGE check list.

Master Power on.  – check
Fuel pump on  - check
Navigation system (gyros) on – check

Eventually the engine start button can be pressed. After that,  you still have to wait a CONSIDERABLE amount of time until the helicopter blades get up to speed. You then become airborne…and THEN CROSS THE STREET.

So, creating a whole query and a recordset is going to take MORE TIME then it takes to transfer about 100,000 records of data…often even more data can be transferred in this time!!

So, keep in mind how expensive it is to open up a connection and process a sql query that sends results to the reocrdset. (at least  you did not re-create the connection each time..but you still execute a query….).

You choices to fix this is to open the recordset once at the start of the loop. (there is a zillion ways to do this…depending on the type of environment and “scope” you have available. So, you might just have global var to the set of routines (oh..gee pascal was so nice with this type of concept – local vars to a “set” of routines).

You might just simply pass the reocrdset to the routine as a parameter, and then use a find first in the code. (caution, this would not work with a LARGE database…but for anything under 5000 records…it will fly).

And, of course, using array as you finally did will also work. It is not that the array is fast, but hitting the server for information each time in that loop will kill you (just like starting up the helicopter).

For some strange reason…I don’t like they harp back to the old days of FORTRAN. But, the lesson learned here is don’t create a reocrdset over and over..they are expensive…VERY expensive to create…

Albert D. Kallal
Edmonton, Alberta Canada
Albert D. Kallal Send private email
Friday, April 21, 2006
>> For example recently I had a SQL query which was taking 1 1/2 to run, I removed the sorting and moved it into a perl script, the SQL went down to 15-20 minutes and the perl literally took less than a minute to run.

If this spawns a generation of programmers suddenly deciding to do all their sorting outside of the database then you know whose fault it'll be, right? :D

Seriously though, what you probably had here was a change in the query optimisation brought on by not requiring a data sort -- often this means (in Oracle) that the optimizer was making insufficient use of indexes in the sorted query.
David Aldridge Send private email
Tuesday, April 25, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz