A former community discussing the business of software, from the smallest shareware operation to Microsoft. A part of Joel on Software.
We're closed, folks!
Doug Nebeker ("Doug")
I don't know if anyone will get to see this post because recently a thread I started on this forum was hellbanned and subsequently my last post on another thread was similarly dealt with, but here goes.
My experience leads me to believe that moderation on this forum is heavy handed.
I don't know if anyone else is experiencing this, actually a desire to find that out, is what prompted this thread.
Perhaps asking for recommendations for a suitable OCR library is beyond the Pale on this forum?
Perhaps a response on a thread to the effect that "this forum is not over busy" is worthy of hellbanning?
Personally I don't think so,
> Perhaps asking for recommendations for a suitable OCR library is
> beyond the Pale on this forum?
As I understand it, a moderator replied to you that yes, it was. It seems arguably reasonable for a BOS forum.
> Perhaps a response on a thread to the effect that "this forum is not
> over busy" is worthy of hellbanning?
I think if you were "hellbanned" (I had to look that up. And I'm glad I had to look that up.) this post wouldn't have shown up.
More on selling software would be welcome. :D
I'm not a moderator, but yes, this forum is not the right place to ask what is the best OCR library (or the best web hosting service or talking about robots stealing our jobs).
This is forum for topics about business of software.
As someone who asks, you're doing yourself a dis-service by not asking such question in a forum where you can expect a good answer, presumably populated by OCR experts.
As someone who makes a fuss about it, you demonstrate the need for moderation.
You've made up your mind. You won't be convinced that OCR is not the right topic for this forum. You will keep making a fuss about it.
Therefore you can either be banned by a moderator or you'll keep further derailing this forum with off-topic discussions on moderation (and I get the irony of writing this but personally I would like more moderation, not less. Say no to discussions about robots).
Friday, August 09, 2013
As moderators, the primary goals are to keep the discussions on topic (i.e., the business of running a software company), remove spammy posts, and try to keep things civil. We're not robots - just people - so there is a natural variation in how the forum gets moderated.
There is no hellbanning, but there is a Bayesian filter. So the more times that your comments get deleted for any of the above reasons, the more likely it is that your future posts will get triggered as suspicious. I haven't done any analysis of this, my gut feel is that the people who have higher trigger rates have earned it by getting in petty arguments in various discussions and having their comments removed.
Ultimately, though, we didn't volunteer to be moderators in order to be babysitters or impose narrow views and restrictions on others. We like this forum because it's not a Q&A forum. It's a discussion forum.
This kind of anal retentive salami slicing of "permitted" posts is exactly the reason that this forum has a poor reputation as an overmoderated nanny state.
A question from a forumer here regarding the choice of an OCR or other library could potentially be examined from the perspective of typical mISV business issues, such as cost per seat, maintainability, responsiveness of the vendor, etc.
By simply making a judgement that a post that has plenty of precedent in existing forum topics should be nuked, you make it undesirable for newcomers to bother posting. People don't like to be bitch slapped for arbitrary transgressions, and it's not exactly like you have a high scroll rate here, either.
I guess they want OCR questions posted at StackOverflow instead. That sort of makes sense I guess.
The robot thread on the other hand was a discussion and was fun. Glad it wasn't deleted.
If we are banning and censoring, let's ban and censor the stuffed shirts that don't like robot threads!
"Does anyone know a forum where it might be allowed to ask for recommendations/advice on choosing a component/library for a windows application? "
That's pretty open-ended, is much better suited for StackOverflow (and if you formulate a more specific question, you will get answers on SO), and gave no hint that you were interested in the business aspects of either purchasing or selling OCR libraries. Sorry, but no sympathy from me. And your post is still listed in this forum, so it is a long way from being wiped out.
Now, here's my two cents as to the viability of OCR software.
First, it is technically difficult to get good results, and high quality libraries aren't being given away for nothing. New entrants face very good competitors in a stagnant (at best) application market.
Second, if you try to use OCR in a business setting, it's a huge drain on human resources. The type of documents that are not derived from digital data (so you have no choice but to decode graphic representations of data) do not lend themselves to highly accurate OCR. On the other hand, documents that are formatted to produce good OCR results should be parsed in their original digital form, instead of introducing noise though printing, scanning and automated pattern recognition.
Important documents that aren't available in their original digital form (or were never created in a digital form) can't be OCR'd accurately enough to eliminate the need for manual error checking. Manual checking is extremely tedious, time consuming and also prone to mistakes. At least the original document has formatting designed to give human eyes context, and is much easier to read and recall information from.
Third, the best (and simplest) document management system for data that isn't stored in character form is often a database of images related to attributes like author, source and date, without any attempt to produce characters from the image. The human eye is light years better at pattern recognition.
Only if you have a strictly limited subset of possible character results, such as postal codes on an envelope, is OCR viable from a business standpoint. Postal mail isn't exactly a growth industry though, there are some very good in-house applications being used by postal services, and various types of bar coding and sorting schemes are used for bulk mailers. I would suggest putting your efforts into something other than OCR.
It's more the way it was done that I found annoying.
On the first thread the moderator responded to my post about OCR
"This is a bit too much of a technical question for this forum. "
He didn't lock the thread. It's still there on my computer.
Only for the fact that I looked at the forum from elsewhere later, I'd still think the thread was open.
It leaves you free to post a follow up without knowing that nobody but you and probably the moderator can see it.
It's like it's designed to make a fool of you.
So admittedly feeling a bit aggravated, I started the second thread.
That thread wasn't banned but again the moderator couldn't resist playing another 'trick' on me by hiding my last post from everyone but me!
If you're interested I uploaded a picture file that shows
(A) what the moderator wants me to see
(B) what everyone else sees.
"As someone who makes a fuss about it, you demonstrate the need for moderation."
Look I know it's small beer, but if I had been moderated 'differently' both times then I wouldn't have started this thread or the previous one for that matter.
The best word I can find to describe the current system is 'sneaky'
and I think it worth discussing.
"As moderators, the primary goals are to keep the discussions on topic (i.e., the business of running a software company)"
Well I have a commercial product and I want to add OCR capabilities to it.
In my post I mentioned what I'd looked at so far and asked for recommendations/advice.
"There is no hellbanning, but there is a Bayesian filter. So the more times that your comments get deleted for any of the above reasons, the more likely it is that your future posts will get triggered as suspicious."
OK thanks for that explanation. It looks a lot like hellbanning from this end though!
Last week I'd have though what you say to be an exaggeration but now I agree with every word! Nothing to add to that.
I actually posted it here because I thought it wouldn't fit the SO guidelines. On SO they might have closed it but that at least would have done it in an open transparent manner.
Actually the thread about the OCR IS invisible to you.
Thanks for your detailed views on OCR all of which I agree with in general, particularly the bit about printed documents not being a growth market..
I have a specific use case for it though.
My clients still receive payment listings in paper form.
The paying body print off a file and send it out. For years there has been talk of electronic listings.
As an interim measure they've even been asked, without success, if it is possible to print to pdf and email that out instead.
Various other factors suggest that change is further away now than a decade ago.
So every month man hours are spent manually inputting listings.
The documents they send out follow a definite structure, some of the OCR libraries I tried do a really good job on it, (almost perfect with an added user dictionary).
The listings should match claims in the clients' databases and there's redundancy in each listing such that several cells can be used to identify a matching claim .
So I think it's possible with some post ocr analysis to get to virtually 100% accuracy with this.
The advice I was most interested in, in my original post, was recommendations for alternative libraries as the one I've pretty much settled on is highly priced compared to controls I'd have purchased in the past.
> It's like it's designed to make a fool of you.
Joel implemented that about 10 years ago and wrote openly about the design at the time. I can't remember the exact reasoning, but the basic idea is that if messages are visibly deleted, invariably someone will get incensed and repeatedly re-post. But if they don't realize the post was deleted and think that it's just being ignored, they'll stop. The old Joel on Software forum had a lot of traffic back in those days, so it was done to minimize the time spent moderating the forum.
Regarding the OCR post, personally I don't think it's a big deal that you posted the question here, but really, why would you want to? You're much more likely to get good advice if you posted the question at StackOverflow.
Drummer, Nick's right, Joel actually invented the technique. It makes it more tricky for spammers to tell that their spams aren't working by showing them their deleted posts.
I think they also later set it up so a deleted thread just isn't shown on the index but can be seen to participants who posted in the thread already.
Hellbanning/shadowbanning (not sure if these are different at all) is a derived technique where all the person's posts disappear, not just particular ones that are deleted. That's not used here.
-meta threads like this are much less interesting than discussing the 'business of software'
-this is Joel's forum, he provides the server and the software and (initially at least) the traffic that keeps it alive, so he gets to set the policies and appoint the moderators
-the policies haven't changed on this forum in the 8 years I have been here, so they aren't likely to change any time soon
-Joel is a busy guy - it is very unlikely that he is reading this
-there are plenty of other forums with different policies, if you don't like this one
Saturday, August 10, 2013
@Nicholas and Scott
Thanks for taking the time to explain explain the moderating techniques used on this forum.
I have read and taken note of your comments.
I've read the forum guidelines also and they seem perfectly reasonable.
I am shutting up about this topic now.
I have one last thing to add about OCR for your application "payment listings in paper form."
This produces a delay in processing, regardless of whether it is partially automated with OCR or not, which is to the advantage of the payer (even if the printed listing is coming from an intermediary). Basically, because of the delay, it takes at least one business day to verify the transfer of funds, and if the vendor is offering cash or prompt pay bonuses, it allows the payer to hang onto their money for an extra day without penalty. The way financial institutions arbitrarily date transactions well after the fact also facilitates this kind of time warp.
With EFT, the onus for accuracy is on the payer. Without EFT, the cost of redoing financial transactions (because of errors) in terms of fees, discounts and labour is much higher than the cost of manually processing those transactions and manual processing is more accurate than OCR. If your client can't make EFT work, don't push OCR as a solution, it will prove to be more expensive in the long run. Besides, collecting payment is the most important part of a business, and shouldn't be trusted to machines.
I was planning on presenting the user with an onscreen listing that they could compare against the paper listing and edit before hitting the 'ENGAGE' (TM) button that would write it all to the database.
However, you've raised deeper issues there that I must admit had never occurred to me.
So I'm going to take a step back and consult with some of my users before going any further with this.
Thanks very much for your insightful thoughts.
Questions like "Please recommend X" and "what is the best Y" are _closed as non-constructive_ on StackOverflow, though not always:
When I am researching applications, I usually head to alternativeto.net, but I am not sure where to look for libraries/frameworks/etc.
Dmitry Leskov @Home
Sunday, August 11, 2013
I guess that is why there is always room for one more forum/discussion board. I think that is a good thing.
Monday, August 12, 2013
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz