The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

scraping search results

from non-web application...

Big guns have APIs for web-apps, and I believe none can be used with non-web applications per their TOS. So what is a non-web (ie desktop) app to do?

I am guessing scraping would be ok for a desktop app a'la a web browser as long as the query generation does not go beyond what a human can do... I am guessing the search engines wouldn't be able to distinguish a non-browser desktop application from a browser in this scenario.

Any thoughts?
Tuesday, June 10, 2008
> I am guessing scraping would be ok

The information which you're scraping may be copyrighted, and/or the web site's Terms of Use may prohibit screen scraping.
Christopher Wells Send private email
Tuesday, June 10, 2008
It looks like I'll have to pay attention to robots.txt and what it means to be a webbot...

Google disallows /search however a web browser is allowed to access it, right? How else are people supposed to use Google search?! If I am not mistaken, as long as I don't do "deep searches", traversing links within links etc, then it should be ok to incorporate doing web searches within a desktop app...
Tuesday, June 10, 2008
Google may forbid using their API other than in web apps, but AFAICT you can use the Yahoo! API for any kind of application you like.
Wednesday, June 11, 2008
I've used Feedity ( ) in the past for creating RSS feeds out of webpages and search results. Its an awesome tool! See you find it useful.
Monday, June 16, 2008
I don't want to rely on an external party; especially not a free, web 2.0 one!
Tuesday, June 17, 2008
You can emulate a web browser, either by using the Microsoft web browser control, or by using a browser component such as the iMacros Scripting Edition.

Websites can NOT distinguish such an automated web browser from a "human controlled" web browser. Unless, of course, you "overdo" it and click a link every few seconds :-)
Frank Send private email
Tuesday, June 24, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz