A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
Has anyone else tried specifically making their own forums Google-able? Will this take a crippling amount of bandwidth to pull off?
In my head I'd like all forum searching to work as well as groups.google.com, but unfortunately I don't own a multi-million (billion?) dollar server farm. Google also cheats with their USENET search because they only reindex the 'new posts' that come in every so often. It's not so easy for me.
Of course everyone not in Google implements their own (typically awful) search for all of the major discussion group/forum products, but I'd really like to design something such that each post is Google-searchable.
We can do this with "fuzzy directory structures" where it essentially fakes a URL so Google sees each page as static. We can also do a Slashdot-type HTML generation all on the server-side (in other words: build actual HTML files for each page; rebuild the file(s) when it/they need updating). This part is doable. I can at least envision this.
But I'm worried about every major search engine coming in and attempting to index every single page on the forum, every evening or however often they drop by. For medium or large forums, that means tens/hundreds/thousands of thousands of pages, indexed by every search engine, every day or so. Am I correct in assuming that this is a BIG chunk of bandwidth?
Or is there some sort of trick I can use to minimize bandwidth usage by search engine spiders?
either you want your forums spidered or you don't, right?
Thursday, November 03, 2005
Spiders aren't interested in wasting bandwidth either. If you create static files and serve them as such, robots should respect file modification date and not re-crawl what has not changed. The same can be achieved with dynamic content if you use/check Last-Modified and If-Modified-Since HTTP headers.
As for unwanted spiders, you can block them in robots.txt or .htaccess file.
Friday, November 04, 2005
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz