The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Unstructured address parsing

I want to be able to pass a bunch of text to a component that will find name/email/address/phone/website from this text and return back a data structure.

Arent there any parser components available to buy?

I googled for "unstructured address parsing" and got back lame results.
OP again
Tuesday, February 26, 2008
Sorry moderators, I meant to post this in JOS forum, where I get more replies than this one. :)
OP again
Tuesday, February 26, 2008
I think you are looking for Text mining applications, this is not an easy problem, however there are some free tools for these problems.  take a look at the links below they are pretty cool, yet really advnaced.
Both are java based but can provide you some feedback on parsing unstructured data.
Lucas P Krause Send private email
Tuesday, February 26, 2008
Addresses are typed differently in every country!
Totally Agreeing
Wednesday, February 27, 2008
And differently in the same country.  The style in western Canada tends to be in this order (suite, street address) as in
    123 456 The Street
and in eastern Canada
    456 The Street, #123

Have fun with North Road in Coquitlam, BC and Yew Street Road in Bellingham, WA.  Bellingham also has house numbers with 1/2 in them; I used to live at 1313 1/2 Railroad Ave.  Vancouver, BC has both South Foot of Main Street and North Foot of Main Street as a valid mailing address.  Toronto, ON has Avenue Road.  There are others I could mention.


Gene Wirchenko
Gene Wirchenko Send private email
Wednesday, February 27, 2008
Why do you care?

If it's for searching, full text searching is the way of the future, so a search for "main street" doesn't need "main" classified as a street field. If it's for postage, just print it as entered and assume people know how to enter their own address. If they don't, you're in real trouble anyway.

The only bug-free code is the code that you don't even have - so unless you really need it for an amazingly good reason, just let the address be a big free-form field.

Wednesday, February 27, 2008
As with boys wreck ignition, you will never get past a first approximation needing a human editor.
Wednesday, February 27, 2008
Some british addresses don't actually have addresses, but only a post code. To make matters more confusing, some of these places have names in place of numbers -- or indeed, only a house name!
The Luggage
Thursday, February 28, 2008
It really is different for every country. I used to work for a company that built address matching software, and the algorithms that worked for the UK would not work for the US, or Canada - completely different products for each country.

Simple things like the structure of a zip code or post code vary country by country. The only universal element to the information you're looking at would be email or URL data.

You're unlikely to find a simple component to do this for you. Even if you do hook into a country specific application, you would really need the corresponding databases to accompany it - zip code, post code, address database, valid telephone numbers, etc.

I live in Ireland. My address reads Name, Locality, Nearest large town, Ireland. No street, no building number, no zip code. There's a huge difference between 'official' addresses, that all the paperwork say is 'valid', and actual addresses that people type into edit boxes or write on letters. Addresses that your local mail company delivers to often bear little or no relationship to addresses the documentation says should be printed on envelopes.

Bottom line, this is a hugely complex field, with no absolute answers. No plug in component could possibly address this with any great degree of accuracy.
B.K. Send private email
Thursday, February 28, 2008
Do you need these to just grab addresses from email or do you need to do mailings?
If you are doing bulk mailshots you will pay a lot more if the addresses aren't in the EXACT format the post office wants.
If the UK you have to match the address in the Postal Address File (PAF) the official list of every address, there are similair lists in other countries, it costs 1000s to rent this list.
Martin Send private email
Friday, February 29, 2008
I'm also looking for this type of software, need to be java based.

I only found this one so far;
Commercial product, probably with a hefty price.
Richard Send private email
Tuesday, March 04, 2008
No phone numbers or URLs, but might be a start, in PERL.


Sunday, March 23, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz