The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
Fog Creek Copilot

The Old Forum

Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

Using low paid Humans to beat CAPTCHAs

"Chenette said organized attackers are using automated tools to sign up for Gmail and other Web-mail accounts. When the CAPTCHA image appears, it's automatically sent off to a large and low-paid workforce, typically in another country, where a worker enters the code and sends it back so the account can be created."

How do you stop spammers from using low paid Humans to beat CAPTCHAs? Is the CAPTCHAs days numbered?
Anon Ranter Send private email
Monday, March 03, 2008
It still takes time for someone to type in the captcha. Captchas are not made to defeat all scams. Just fully automated ones. After all, any scammer can manually type in his own captcha codes. What you're talking about is only marginally better for the scammers.
Monday, March 03, 2008
Scamming and spamming can't be beat unless there are laws in place forcing ISPs and companies to react to this stuff faster and better.  But that won't happen because it would be too expensive to implement.
Lord Xeunu Send private email
Monday, March 03, 2008
guns, lots and lots of guns.
Monday, March 03, 2008
Some sites use a short timeout, on the order of 90 seconds. Not perfect, but it raises the bar a little. It also raises the frustration level for legitimate interactions.
Monday, March 03, 2008
No, the captcha is here to stay.  In spite of the cheap labor, this is still has to be a very expensive operation compared to a fully automated crack.  That means it only works because the perpetrators are somehow making a *lot* of money on the deal, and that's only possible because Google is so big. For the vast majority of smaller sites, the cost/benefit just isn't there.

For big sites, there are things operations like Google can do to stop the attack.  It'll just take them a few more days to roll it out. 

For example, you can include a number of hidden inputs in the html that must be filled in or left blank in a particular way.  A human never sees them, but automated software has a hard time knowing what to do with them.

Then you have a system to automatically generate the page on the fly, so the combination of hidden fields is different for each page.  Maybe combine that with encoding part of the captcha as a check bit, though I'm not sure if that really adds anything.

Also, you can use the hidden fields as a sort of code telling the page where to submit to.  The attacker needs to crack that code or they'll submit to the wrong server and won't authenticate.

Finally, you set up the system on the back end so that you can very easily and automatically change up the hidden input validation algorithm every week or so.  Now the attackers are having to spend money on programmers to keep their attack current.  It's still possible to break it, but the cost/benefit starts to dwindle.
Joel Coehoorn Send private email
Monday, March 03, 2008
"Is the CAPTCHAs days numbered?"

I think they are numbered only in the sense that I see us eventually moving back towards web sites that offer real value beyond social networking. For example, I don't think Netflix uses a CAPTCHA and Netflix doesn't have to because they use your credit card number to establish your account.

There is a social network called LinkedIn (I think) that requires you to be vouched for by an existing member. Their mission statement is to create an Old Boys' Network of people who actually know each other.

CAPTCHAs will only be used when you want to give away something for free (more or less) and you just don't want automated machines consuming it just because they can.

But to answer the spirit of the question, no, I don't think there is a business incentive to abandon the CAPTCHA system entirely, I think they will just be more selective in when to use it.
Monday, March 03, 2008
I don't understand why they'd pay anybody

If I was a scammer trying to defeat a captcha, I'd do this:

1. Setup a web site offering free pr0n/movies/MP3s

2. Require people to fill in a captcha to get the stuff, maybe every couple of page views.

3. Get the captcha images from whichever site I'm trying to break
Sunil Tanna
Monday, March 03, 2008
Joel Coehoorn:  The main problem I see is that you need to enable javascript to have an impact on what you do.  But that limits users and makes the page inaccessable.

Eventually, a motivated attacker will create a javascript interpreter and now the problem is reduced to requiring the spammer to spend more CPU time.

Anti-spam methods based on methods without a human in the loop are merely an attempt at rate-limiting the spam and increasing its cost.  If there are no tasks which the average human can complete consistantly and faster than a machine, there is no prevention.

Monday, March 03, 2008
Sunil Tanna:
yeah, that already exists... i read somewhere they had a video of a woman stripping and the visitors had to keep filling a captcha every couple minutes to continue watching it...
Monday, March 03, 2008
PBS did a report about a great idea to use captcha to identify what is written in texts so they can be scanned online.

alot of books are being scanned onto the web. See googles program with colleges. Well some of the words are hard to read and they can't convert it to text. So some professor had an idea to put these words into CAPTCHA. They will be translated by many people to make sure its correct.

you will actually be given two. one that is really captcha and one that is for getting books online. They won't tell you which is which.

I thought this was a great idea...
Monday, March 03, 2008
5 years ago the scam was free porn - one of the free registration hurdles was for a CAPTCHA. Who cares if the CAPTCHA was actually from hotmail?
Allen David
Monday, March 03, 2008
Actually I've often wondered how this forum gets by without captchas. Is it simply a combination of alert moderators and the fact that homebrew forums get under the radar? (Forum spam naturally tends to target phpBBs and UBBs and so on.)

Or is there more behind the scenes, like some domain blocking  or something Bayesian?
aph Send private email
Monday, March 03, 2008
aph:  This forum has a fairly decent system to prevent spam.  First as you guessed are active moderators.  What little spam is usally deleted by them.

The key to the system is the lack of feedback to spammers.  Anything you post you can view from the computer you use forever.  But your post may not be visible to others.  Thus spammers and trolls are often unaware their content is not visible to the other users of the site.

There is spam filtering that goes on, and those posts are marked for moderator approval.  Long posts seem to trigger this approval check, as do other things.  If your post isn't getting any response, check to see if it is visible.

Overall, it works pritty darn well.

Monday, March 03, 2008
Captchas change all the time. Presumably even if your porn scam gave you hundreds of Captchas you would still have to devise a system to work out which one you are being presented with. Not trivial.

With regard to this forum there is a Bayesian filter that catches nearly all spam and holds up some dubious posts. Not much spam, maybe five or ten a day.

If you don't want your posts held up by the filter don't post anonymously, or better still register.
Stephen Jones Send private email
Monday, March 03, 2008
I've actually developed a proof of concept around an idea for internet/winapp security that is much stronger than CAPTCHAs but it's another app I'm not releasing to the public.

Go figure.....I could probably be filthy rich if I weren't so Pollyanna!!!
Brice Richard Send private email
Tuesday, March 04, 2008
I have to admire the ingenuity of the people who come up with the ways around these systems. In a decent world they would earn more in ethical ways with that kind of thinking than with their "blackhat" activities.

Tuesday, March 04, 2008
I run a website for someone on which there's a straightforward guestbook that uses HTTP post, and from my experience by far and away the best way to stop spam dead is to give your HTTP post variables gibberish names. So instead of :

<textarea id="message">

I've got

<textarea id="asdfviudvfwve">.

Then in the script that picks up the HTTP POST, it chucks anything that doesn't have the variable "asdfviudvfwve" set straight in the bin.

I read them out of a config file, and change them every now and again.

I also moderate the guestbook - any post that's not "approved" can only be seen by the same IP address that posts it (which is similar to "deleted" posts on this forum). That fools spammers into thinking their crap worked without the rest of the world knowing. But, to be honest, setting random POST variables works so well I can't think of a post in the last six months I haven't had to reject.

"I've actually developed a proof of concept around an idea for internet/winapp security that is much stronger than CAPTCHAs but it's another app I'm not releasing to the public."

Yeah, and I've developed a proof of concept for a hovercar that runs on cammomile tea for 5,000 miles a gallon. What, don't you believe me?
Ritchie Swann Send private email
Tuesday, March 04, 2008
s/haven't had to reject/have had to reject/
Ritchie Swann Send private email
Tuesday, March 04, 2008
If the captcha doesn't timeout, then what's the point of having it at all?

IMHO, don't make the text hard to read, use real words (via a dictionary) and time it out.

If your captcha lives forever, you fail at programming.
TravisO Send private email
Tuesday, March 04, 2008
Don't use dictionary words if you plan on using your CAPTCHA on even a remotely high profile site.

That means you have to trim your dictionary to remove "offensive" words; because the last thing you want is to serve up, say, the word "faggot" to a soccer mom.

Another downside to the dictionary thing is that you have a finite list of words; this is disadvantageous because a CAPTCHA cracker can use his custom program to 'screen' whether it read a word correctly by checking if it's in the dictionary.
The Luggage
Tuesday, March 04, 2008
"...CAPTCHAs days numbered?"

Sure.  Eventually machine OCR will be better than human OCR.
AFTO Send private email
Tuesday, March 04, 2008
I read a detailed report on how a guy used OCR with 75% success at cracking CAPTCHAs in automated submissions.

I just use a Bayesian classifier for some blogs I'm admin on.  About 300 comments daily, 99% of them porn and drug spams.  In several months, four false spam traps and two false passes.  I'm going to stop eyeballing them--the spams are served a request to contact me for correction.

Pity the spammer stupid enough to do so.  :-)
Wes Groleau Send private email
Wednesday, March 05, 2008
An article on this same issue from Jeff Atwood
Knight Who Says Ni
Wednesday, March 05, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz