The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Optimizing Access Queries

I'm finally discovering that there is more to writing queries than getting results.

I currently have an Access db (hoping not to switch; different discussion) that is pretty big.  A few tables have millions of rows.

The queries that I have written are pretty intense - many, many tables.  I've been learning more about indexes, relations, etc. in an attempt to speed things up (any good links on optimizing Access queries appreciated!).

The pattern that I keep running into - and thinking can be fixed:

I build a complicated query ("q1") that takes a long time to compute (tq1 seconds).  Then I build another query ("q2") from this query that takes about the same amount of time (tq1 + negligible seconds), implying all the work is in q1.  Then I build another query ("q3") from these two queries that takes twice as long (2 * tq1), implying that q1 is built twice.

My particular example: come up with a whole bunch of interest rates from a complicated query.  Then find the maximum date.  Then find the rate on that maximum date.

I'm thinking about building a temporary table to hold the results of q1.  Surely, there's a more logical way?

Mike
Bankstrong Send private email
Monday, February 21, 2005
 
 
==>Surely, there's a more logical way?


Ayup.

Don't use MS Access for this. Seriously.

Access is fine for what it is.

For queries against multiple tables with multiple millions of rows, you're likely not going to see the performance you're looking for with Access. Don't get me wrong, I love MS Access as a product and use it quite a lot. Just not for databases of this size with complex queries. You'll be nothing but disappointed if you stick with Access for this.
Sgt.Sausage
Monday, February 21, 2005
 
 
Sarge -

<<Don't use MS Access for this. Seriously.>>

I think this may be good advice.  I've been wary of anti-Jet bigots, but I'm starting to see some valid reasons for SQL Server.  For one thing, the universe of Access experts may be limited to you and Albert Kallal.  I google for tough answers in access groups and he's the only one with the right answer.

There a few other good reasons (query optimization, data size), but I'm still hoping to eke out a few more months with Jet.  I have customers breathing down my neck, even for sub-optimal solutions.

So, I'm asking what to do with the current tools. 

But, while you bring it up, is this particular problem one that SQL server handles correctly?

Thanks,
Mike
Bankstrong Send private email
Monday, February 21, 2005
 
 
Your problem sounds like a query composition problem, and you'd encounter the same issue with SQL Server (albeit it may provide better performance overall in some scenarios -- though that most certainly isn't guaranteed -- masking the issue).

To see if I'm understanding what you're saying, you have a large, complex query. In one query you pull one value out of that resultset, and then in another query you pull out a different value. In this case it has no choice but to execute the query twice, and it would do the same thing in SQL Server.
Dennis Forbes Send private email
Monday, February 21, 2005
 
 
Dennis -

<<In this case it has no choice but to execute the query twice, and it would do the same thing in SQL Server. >>

I think you are understanding the question, but I'm surprised at the answer.  It seems like if I (a lowly human) am smart enough to see that the same query is being calculated twice, the machine can as well.  I would have thought that unless I goofed in some way (or used a cheap db), the db would create a temporary table (with the correct indexes) or something more subtle.

That said, I've read your two articles and it's clear that you know vastly more than me about this stuff.  BTW, thanks again, they have come in very handy.

Mike
Bankstrong Send private email
Monday, February 21, 2005
 
 
There are desktop OLAP tools (some free) that can source data from a Jet DB.  You might try looking at these for your particular problem if your problem maps into that domain.

For lightweight stuff, I know some people here have used JPivot and Mondrian.
Art Send private email
Monday, February 21, 2005
 
 
Does the database reside on a local drive or on a file server? Since Access has no server component* all data needs to be bought locally for Access to work on, so it gets really hit by network performance.

*There is MSDE which is basically SQL Server.


Ensure that your foreign keys have indexes on them, as they will help speed up joins between tables.
TommyA
Monday, February 21, 2005
 
 
When I build complex query sets like the ones you've described, I often find it helpful to copy the SQL statements from each to a text editor.  Then I can analyze the statements to see where they can be combined into fewer queries. There is no "trick" using this method, though.
Cowboy coder
Monday, February 21, 2005
 
 
"It seems like if I (a lowly human) am smart enough to see that the same query is being calculated twice, the machine can as well."

Indeed, and perhaps at some point they might figure out a "whole procedure optimization", but at this point optimizations occur pretty much at the individual statement level. If you do the exact query two times in a row, pulling a different row, it will generate each of them from scratch, though if it has a cross-statement optimizer it would know that it could merge them and pull them at the same time. Alas.
Dennis Forbes Send private email
Monday, February 21, 2005
 
 
Mike, I feel your pain. I used to do telecoms billing in Access, which was great when there were 200 customers, but very slow with 30,000.

Access scales well up to about 100,000 rows, but beyond that doesn't seem to cope.

The monthly bill run went from an hour to several days once the number of calls to process got up to one million. We had to ditch Access and move to SQL Server, which yielded amazing speed gains, taking the monthly bill-run back down to a couple of hours.

In the end, I re-wrote it to do the billing in real-time, as the call records came off the switch, but that's another story.

In conclusion, I love Access, but with anything over 100,000 rows, just say no.
54783AD8-B093-4EE7-B452-2C28AC3E5FA3 Send private email
Tuesday, February 22, 2005
 
 
I have to take issue with the "Access can't handle big tables" school of thought.  I used to build telecom billing DB's on Access, too, and I had DB's over 1m records.  Simple queries on this size DB, using proper indexes, would run in 1-2 secs on a 486 PC.

I had moderate success using temp tables for complex queries.  In order for this to work well, you need to ensure that the temp table is a small fraction of the size of the original, and the selection needs to happen off an indexed field (for best results, the index should have a lot of possible values -- ie, "true" / "false" is a bad index).  If you can't make both of those things true, then I'd exepct your results to be poor.

Time the query to get a sense for speed, and understand that you'll lose some speed writing results to a temp table (again, the temp table should be small if this is going to work).  Also, if you're not already doing so, you'll probably want to monitor the size of your DB so you can do compacts if needed.  If doing a compact on your main DB is too time-consuming, your could also write the results into a temp DB, but this will also be time-consuming, so make sure it's worth it.

Having said all this, yes, I'd agree that all things being equal, you should move to SQL Server (or MSDE) when you can.
D. Lambert Send private email
Tuesday, February 22, 2005
 
 
==> all data needs to be bought locally for Access to work on

Let's get this right. In most cases -- all *indexes* need to be brought local. Most of the time, with proper indexing, Access does not drag the actual data across the wire. Ite pulls the indexes, does the processing locally with indexes, and then goes back and fetches the actual data it needs for the results. Still a lot of back and forth between the file server and the local Jet instance, but not near as bad as you're making it out to be.

I'm not on the anti-Jet bandwagon, but I do agree that there are some tasks where you simply outgrow the capabilities of a simple, file-server based database. This *may* (or may not) be one of them. We don't have enough info. to make that decision.

My informal "rule of thumb" is Access for small-stuff. Tables in the hundreds of thousands of records and less. Occasional million or multi-million rows, maybe. Lots of tables with millions, tens or hundreds of millions, not a chance. This seems to work as a "rule of thumb" and has served me well over the last 10 years or so (Been working heavily w/ Access since v 2.0 (1994-ish) .

BTW -- this is in reference to *Jet*, not Access itself.

We still prefer to do a lot of application programming in Access, even when the data resides in a big bad server database somewhere. Access will do just fine with the larger datasets that live on a giant server somewhere, as long as you use the server itself to do the processing and don't use Jet to pull the data. In this model, Access is "display only" (screens, reports etc.) and the heavy lifting of processing the data is done server side.

You may have simply outgrown Jet/Access with this application, or alternatively, you may just have a brain-dead query. We really don't know. How 'bout a repro-script: table defs, w/indexes, and the actual text of the SQL. That would really help us take a look into what's going on here. Without that, we can only guess.
Sgt.Sausage
Tuesday, February 22, 2005
 
 
Sarge -

<<You may have simply outgrown Jet/Access with this application, or alternatively, you may just have a brain-dead query. We really don't know. How 'bout a repro-script: table defs, w/indexes, and the actual text of the SQL. That would really help us take a look into what's going on here. Without that, we can only guess. >>

It would be pretty difficult to publicly post all of this info. 

I think either (or both) possibilities are possible - dumb query writing or inadequate db.  I figure I have to get rid of the dumb queries no matter which db I use.  So, I've been educating myself on everything beyond writing complex queries (good indexes, fixing up normalization problems, more efficient queries).  I figure I can upgrade to sql server / msde either as a last resort on speed or when db size forces me.

I think that Albert Kallal may have solved this particular gap in my knowledge (how to combine query 1 and query 1').  I have not implemented it yet.  I can't even do it justice with a summary (he's the only guy that I know who can overload my mailserver with plaintext non-spam :) ).

I will post his response next ( I don't think it is private, just a response to my email).

Mike
Bankstrong Send private email
Tuesday, February 22, 2005
 
 
With humble appreciation,

>1) Does the Access query optimizer work anywhere near as well as SQL Server?
 
Hum, I would have to say that sql server is better. However, do note as I mentioned, if you do NOT have a network involved, then JET is usually FASTER then sql (on the same machine).
 
2) I'm not knowledgeable in forming subqueries, so I've always created "sub" queries that are saved and then built from.  This is nice, because it makes debugging a little easier.  But now that I'm timing everything to find speed improvements, I'm a little more aware of the mechanics.
 
Yes, also it is "easier" to build a a single query that gets you one piece of information, and then string together several queries. I also often do this..as it is just saves human brain power! However, you can often make *very* impressive gains if you drop the use of multiple queries..and use a sub query. However, it is not so much that you have strung queries together, but the fact that using sub-queries *forces* YOU to makes things work better.
 
Lets take a VERY simple example (that will apply to you). Lets say have a customer, and we want the last invoice. However, not only do we want the last invoice date, but we ALSO want other details of the last invoice (amount, and location). Of course, we all know that retuning the last invoice date is a piece of cake (you can use dmax). However, to join in the other data you DO NOT WANT to do a join on the date field. Lets assume customers and a invoice table here. So, we want to show the customer (from the customer table), and just show the last invoice date (from table invoices).
 
To find the last invoice date we would fire up the query builder, drop in the customer table, and then drop in the invoice table. We would draw a join line from the main table tot he child table. We would hit the "Z" (summation) button, and then drop in the firstname, lastname, and from the child table, we could drop in the date. Easy, and the query builder will give something like:
 
 
SELECT ID, FirstName, LastName, Max(tblInvoices.InvoiceDate) AS LastInvoiceDate
FROM tblCustomers INNER JOIN tblInvoices ON tblCustomers.ID = tblInvoices.Main_id
GROUP BY ID, FirstName, LastName

 
Of course, the problem with the above is that we mentioned that we wanted to bring in several additional fields from the invoice table, and ONLY the "last" invoice record. (you can't use dmax on the other fields to get the last values that belong to the last date invoice). In the past, I might have created two separate queries here, and then joined them together. However, the real trick here is to actually build a STANDARD join, and then RESTRICT the results to the ONE last record.
 
I also going to mention that we have all kinds of extra fields in the group by..and we DO NOT WANT that!!
 
Here is how we use a sub-query to do this.
 
First, just build he query with the main table, and the child table joined in. (no, grouping here...just make the join). We get:
 
SELECT ID, FirstName, LastName, tblInvoices.InvoiceDate
FROM tblCustomers INNER JOIN tblInvoices ON tblCustomers.ID = tblInvoices.Main_id

 
As you can see in the above, I also wanted other fields from the child table. (location, and amount). So, we get:
 
SELECT ID, FirstName, LastName, tblInvoices.ID, tblInvoices.InvoiceDate,
            tblInvoices.Amount, tblInvoices.Location
FROM tblCustomers
INNER JOIN
    tblInvoices ON tblCustomers.ID = tblInvoices.Main_id

 
Up to this point, we simply used the query builder. So far, the above is very easy to make, and we are still 100% in the query builder. Of course the above will return several results for each customer, but as mentioned we ONLY want the last one (invoice). So, we do NOT resort to using dmax, as that is VERY clumsily *when* you want additional fields. Further more, using dmax never gave us a clean way of returning the other fields from the invoice table associated with the last invoice date.
 
The solution here is to put a condition that finds the last invoice, and thus restricts the whole thing to the ONE record. The query to find the "last" invoice would be:
 
select top 1 id from tblInvoices where main_id = tblCustomers.ID
Order by InvoiceDate DESC, ID DESC
 
It is important that the above query will only return ONE record, and thus I have a order by both the InoivceDate, and the "id" field of table invoices (I assume you still have a unique primary key even for all child tables..as you need them!!). If there was TWO invoice dates for the custom on the same day, the top 1 could return TWO dates, but by using the order by clause, and putting in the extra order by of "id" after the invoice date, then the query will only return ONE id (this is important, since if the query returns more then one ID..the sql will complain,  (choke as it can't return two values for the test.
 
So, now, the simply trick is to take the above query, and use it as a condition. We get:
 
SELECT ID, FirstName, LastName, tblInvoices.ID, tblInvoices.InvoiceDate,
            tblInvoices.Amount, tblInvoices.Location
FROM tblCustomers
INNER JOIN
    tblInvoices ON tblCustomers.ID = tblInvoices.Main_id
where tblInvoices.ID =
      (select top 1 id from tblInvoices
        where main_id = tblCustomers.ID
        Order by InvoiceDate DESC, ID DESC)
 
So, we built the query as we needed, and then throw in a condition to restrict the query to the one record. If you think about this, the query optimizer can now crunch each query, process it. and then start running and returning results.
 
The above was/is a solution for when you need to return a several values from the child table.
 
However, what about the case where you need several last values from several tables. In this case, you can use the dmax, but MAKE SURE you restrict the results. Lets go all the way back to our first example, and assume we want the max (last invoice date), but we DO NOT need additional fields form that table. We could use:
 
 
SELECT ID, FirstName, LastName, tblInvoices.ID,
 
          (select dmax(InvoiceDate) from tblInvoices where main_id = tblCustomers.id)
            as LastInvoice
FROM tblCustomers
 
Notice how we are not using ANY joins where. You can make a sub-query return ONE value and use that as a expression.
 
Anyway, I am going to suggest that you try a standard join, and dump all of the dmax() expression you have, dump all of the group by. Simply use a nice clean query, and join in the 2nd table. The trick here is to always include a "key" id restricting, as they run very fast.
 
And, if you are not trying t return other fields from the "last whatever", then try the above damx as above, as again, we don't use a join...and this often runs a lot faster also.
 
I am kind of busy right now, but if I get a chance, I will try a take a closer look at your query to see what else we can improve on. (the above two suggests are what I would try first..and then thus you dump the queries working on queries).
 
 
Albert D. Kallal
Edmonton, Alberta Canada
Kallal@msn.com
http://www.members.shaw.ca/AlbertKallal
Bankstrong Send private email
Tuesday, February 22, 2005
 
 
'I think this may be good advice.  I've been wary of anti-Jet bigots, but I'm starting to see some valid reasons for SQL Server.  For one thing, the universe of Access experts may be limited to you and Albert Kallal.  I google for tough answers in access groups and he's the only one with the right answer.'

Albert is one of the strongest most reliable posters in comp.database.ms-access. And full kudos to him for that. His output is very high quality.

But he's not the only one. You may be using the wrong newsgroups if you have found a paucity of Access information on the internet.

I would start at:

comp.database.ms-access
http://www.mvps.org/access/
http://www.trigeminal.com/
http://www.lebans.com/

I don't know what sort of questions you have asked, but you'll find many answers at the above.
JO Davis
Thursday, February 24, 2005
 
 
I haven't read the complete thread because I'm in a hurry, but I'd like to drop a few comments:

First there's MSDE (already mentioned above) which basically is an SQL Server with minor restrictions and free (after registration). You can download it from Microsoft's web pages.

Second there's an analysis tool for the Microsoft Jet engine (unfortunately not very well documented). You have to set the JetShowPlan registry key and Microsoft Access will write a log containing information about optimization for each executed query. Just search Microsoft's web pages for JetShowPlan.

HTH

Sunday, March 20, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz