The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Using URL manipulation to access search engines via desktop app

I know screenscraping is no no, but here is my case:

I want to search multiple search engines simultaneously from a single search box. I type in my keywords there and let a script submit the keywords to each individual SE, then aggregate the results and display them back.

The script isn't a bot. It runs when a human submits a query. Does this still constitute a violation?

I would imagine not since that's basically what a browser does (except one at a time), but I am not sure...
Wednesday, October 22, 2008
In short, does accessing a search engine from a browser-like component constitute screenscraping?
Wednesday, October 22, 2008
By definition,Screen scraping is a technique in which a computer program extracts data from the display output of another program.  The key element that distinguishes screen scraping from regular parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing. Screen scraping often involves ignoring binary data (usually images or multimedia data) and formatting elements that obscure the essential, desired text data.
There are a number of synonyms for screen scraping, including: Data scraping, data extraction, web scraping, page scraping, web page wrapping and HTML scraping (the last four being specific to scraping web pages).
In your case, you are searching various engines, then before display they are passed to your program for processing and that is screenscraping.
Dan Oyuga Anne Send private email
Thursday, October 23, 2008
"Screen scraping is a no no"

Very few things are an absolute "no no" - incest and Morris Dancing are the only examples I can think of :-)

You need to understand the trade-offs you are making when screen scraping. But if it's the only way to get the job done then that's what you do.

Major down side to screen scraping is that you are not programming to a "contract". Changes to the UI, even small cosmetic changes, may mean changes to your code and you will probably get no warning of changes.

So, do these search engines offer a formal programming API? If they do then that's likely to be preferable to screen scraping. For example Google:
Dave Artus Send private email
Thursday, October 23, 2008
Take a look at OpenSearch.
Ed Send private email
Thursday, October 23, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz