The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

I've crippled the server.

Alright, so at my primary contracting site, they wanted an application that would poll a series of sites (about 80), pull particular data out and put it in the database for future usage.  This data comes incrementally throughout the day, so I set it up to poll half the sites every 30 minutes.

Up until yesterday, we only had about 25 of the sources active and I had a self-imposed throttle of 100 items/minute.  Yesterday, I activated the other 55 sites and let it rip at about 2pm... after being instructed to adjust the throttle to 1000 items/minute.

At 2pm, the data was coming in at about 5-10 items/minute.
By 6pm, it was spiking at 30-40/minute.
By 8pm, it was spiking at 60-80/minute.
By 10pm, it was spiking at 120/minute.
By midnight, it was STEADILY at 180/minute.

Then the admin shut it down.  It works exactly as requested and has the exact result that I warned about.

KC Send private email
Thursday, June 16, 2005
That's 333 msecs an item. Why would that be a problem?
son of parnas
Thursday, June 16, 2005
(sorry for being so vague)

When an item goes into the database, there is a full text indexing and another series of processes that happens.
KC Send private email
Friday, June 17, 2005
KC, there is something you are not telling us. Clearly there is a need for those sites to be pulled in and dealt with. How does throttling help with that? What is wrong with just letting it rip? Does trashing occur? Is there an overconsumption on some shared resource (network?)?
Just me (Sir to you) Send private email
Friday, June 17, 2005
Reading between the lines you're getting thrashing as the text indexing is too slow to keep up with the number of entries and its creating some kind of race condition.

If that's so then I'd semaphore the process depending upon which one is the bottleneck.

I've got what looks like a similar problem with I Just Heard with redirects coming in from casino and contact sites trying to bump their indexing, sucking entire day's worth of entries at the same time as new entries are coming in at some point the ZCatalog seems to go into thrash mode.  I'm considering what my long term solution is, other than pattern bombing casino and contact sites that is.
Simon Lucy Send private email
Friday, June 17, 2005
The problem with not having a throttle is that the system will then download ALL the data from each of the selected sites.  I initially put it in place for testing and had it set at 10 to ensure the data was coming in and being parsed correctly.

One of the big problems... one of the sites can take upwards of 4+ minutes to respond after a successful request for data.  I've timed this myself and this is one of the bottlenecks of the data parsing, but I'm not sure how much I can do about that one...
KC Send private email
Friday, June 17, 2005
The obvious solution would seem to me to be not to index as each item comes in, but make that a separate operation to run after the polling has concluded.  You'll have usable data in about the same amount of time, since you were going to have to wait forever for any query results while the db was thrashing.

Just my two pence.
Boofus McGoofus Send private email
Saturday, June 18, 2005

It's not clear, but you're implying that the database is running on the same machine as the web server. If so, do not do this. Always keep the web server and database server apart, using gigabit ethernet connection if that's what you need.

My other advice would be to profile the app using something like Vtune to find exactly where the bottlenecks are. You might be surprised at the result - I usually am when I do this exercise.
Mark Pearce Send private email
Saturday, June 18, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz