The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Cache strategies for a CMS...

I'm designing a custom CMS system in ASP.NET 2.0 and am to the point where I'd like to implement some caching to improve performance.  Basically what I have is an HttpModule that, on every page request:

1)  Looks up the page in the db
2)  Creates a context object for the call with the current page and other items (folder, site, template) - each item is one call to the db to load an object and attach it to the context object for use later in the call.
3)  Grabs the page's contents from the database.

My basic goal is to cache the page if it has not already been cached and on subsequent requests build all the objects and page from the cache rather than the database.  If the page's contents change (e.g. someone edited some text) then I'll invalidate the cache and rebuild it on the next request.

My questions:

1)  To do this, should I be using server side caching only?  By this I mean the cache API (e.g. ApplicationContext.Cache.Add(), etc.)

2)  If I do start caching all these pages/objects I'm afraid I'll use up all the machine's memory.  Is there a pattern/best practice available for checking cache memory used and removing lessor used cache items to make room for new ones?

Any tips or ideas you can share would be much appreciated.

Thanks.
Craig
Thursday, May 25, 2006
 
 
Exhibit A:
http://www.danga.com/memcached/

Exhibit B:
http://www.danga.com/memcached/apis.bml

In theory, should'nt you be able to use the C# api?
Cory R. King
Thursday, May 25, 2006
 
 
The most effective cache is the one closest to the user.

If-Modified-Since is your friend. Do it right and you won't need any server side apis.
Matt.
Thursday, May 25, 2006
 
 
How do you avoid performing the db query used to determine whether a modification has been made?  Or, does this solution just reduce the total number of db operations?
Chris Marshall Send private email
Thursday, May 25, 2006
 
 
I'll admit, at least on my website the system gets cumbersome.  I'm storing things like unrendered comment hashes, story text, and a few other things that my CMS frequently asks for.

Since the CMS I use is ancient and has SQL, HTML and business logic scattered everywhere, there is no central place for data objects.  Thus, any parts of the code that touch the DB must remember to touch the cache too.  Akk!

I'm curious as to how any kind of browser-side caching would work on anything but a static website?  There is so much that might change, it would seem to me that you'd pretty much have to make all the regular calls just to figure out if something expired.
Cory R. King
Thursday, May 25, 2006
 
 
"How do you avoid performing the db query used to determine whether a modification has been made?"

I'm hoping that by using server side caching I can remove an item from the cache whenever it is updated by someone so it will be re-cached on the next request.  This will avoid any querying of the db to check for modification date/time.

"Or, does this solution just reduce the total number of db operations?"

Optimally I'd like to reduce the number of operations to none but I'm doing some analysis to see if that's actually feasable.

WRT client side caching it seems like it'd be heavy lifting to do the checks and then generate the appropriate header every time.  I'll look into it more as I don't understand fully what needs to happen to make this work.

Thanks all for your comments.
Craig
Thursday, May 25, 2006
 
 
I'm curious how your going to lookup each page.  Is it by Url?  I'm imagining a scenario where you may have /splashpage.aspx  where the content could really have five or six different versions of dynamic content, according to some cookie or session variable.  Will you still cache pages *that* dynamic?
Vince Send private email
Thursday, May 25, 2006
 
 
I've played with this thing a bit.

First off, the poster who suggested honoring If-Modified-Since is dead on. The biggest perceived gains will come from eliminating unnecessary downloads, not just unnecessary server-side rendering. Be sure to do that no matter what you do with server-side caching.

Second, unless your database server is a major bottleneck, you may not see the gains you expect from server-side caching. In my case, I found that checking cache validity on each request cost about the same as just regenerating the page. Consider load-testing a simple version of your caching scheme to see how it performs before you commit to it.

If the nature of the site produced by the CMS allows it, consider having the CMS generate static HTML pages whenever an edit occurs. That way, the web server takes care of caching for you. If that doesn't work, you might look at having all access go through some sort of caching proxy server. Once again, If-Modified-Since is your friend.
clcr
Thursday, May 25, 2006
 
 
"I'm curious as to how any kind of browser-side caching would work on anything but a static website?"

I'm currently adding a lot of browser-side caching to my web application.  I've already got one "layer" in place using ETags: I hash the entire content of the page and set the ETag.  I compare the incoming ETag with the content I'm about to send and if there's a match, I just return a 301.  This saves no processing on the server but it does save bandwidth and download time.

The second "layer" will use If-Modified-Since header and I'm working on that now.  The server will do a small query to determine if anything on the page has changed and if it hasn't then nothing else has to be done.  Each page has it's own query.  This should improve things a lot as there are several long lists in the application that change infrequently.
Almost H. Anonymous Send private email
Thursday, May 25, 2006
 
 
"In my case, I found that checking cache validity on each request cost about the same as just regenerating the page."

You shouldn't check cache validity on each request. Always assume that the cache is valid.

Then, all read operations retrieve data directly from the cache. If the data doesn't exist in the cache, then the cache fetches it.

All editing operations also have to go through the cache, so that the previous cached copy can be discarded upon edit.

But if you have to go all the way back to the database to check cache validity, there's very little point in having a cache.
BenjiSmith Send private email
Thursday, May 25, 2006
 
 
In my case, checking the cache validity for a page involved nothing more than checking for the existence of a particular file. The scheme you describe would still carry that exact overhead. That alone was comparable to the cost of fetching the necessary info from the database and rendering the page in my case. Granted, the rendering process was *very* simple, but it goes to show that filesystem operations can be more expensive than expected.
clcr
Thursday, May 25, 2006
 
 
You'd think that connecting to the database, fetching rows, and rendering a page would take a lot more time that checking if a file exists.  Perhaps one has to assume a non-trival amount of data and rendering.  I do some caching just to avoid having to open a database connection.
Almost H. Anonymous Send private email
Thursday, May 25, 2006
 
 
How much traffic are you getting? You may not even need a cache.
Joe Send private email
Thursday, May 25, 2006
 
 
"In my case, checking the cache validity for a page involved nothing more than checking for the existence of a particular file."

Aha. I thought we were talking about an in-memory cache.

"That alone was comparable to the cost of fetching the necessary info from the database and rendering the page in my case."

Just out of curiosity, why did you bother with a caching system if page rendering was so simple?
BenjiSmith Send private email
Thursday, May 25, 2006
 
 
"Just out of curiosity, why did you bother with a caching system if page rendering was so simple?"

It's one of those things that seemed like a good idea at the time.
clcr
Friday, May 26, 2006
 
 
If you have to run a query to figure out if anything has changed how much is saved by not rendering the page?

The way I have worked this is using a sequence number to verify if anything has changed. This requires that changes update the sequence number, but it is very efficient.
son of parnas
Friday, May 26, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz