The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

How to implement: "You typed X, did you mean Y?" on result page

Was asked in interview for data warehouse programmer role something like the following (I'm paraphrasing):

How would you provide the user with a choice to research based on Y, not their entry of X?  Google says, "Did you mean ....?" above the results table.

My answer (a bit ad libbed):

Design option:
1. Identify and map core sounds to letters
2. Determine the phonetics of the user entry
3. Scan dictionary for phonetic match
4. Use regular expressions somehow

Anyone care to tear this apart.  The interviewers were too polite to attack, but I hope you won't be.  Thanks very much.
Peter Hutchinson Send private email
Wednesday, March 30, 2005
Hmm. Doesn't SQL provide a "like"? Could you use that somehow?

Wednesday, March 30, 2005
Interesting question. I would have grilled them out for the answer just for the sake of it.

MS Word has a 'sounds like' option in Search. Fuzzy logic is probably involved.

I'm not sure where regular expressions come in, though.
Alex Send private email
Wednesday, March 30, 2005
Use a proper spell checker library!

Trying to make your own poor Soundex is asking for trouble.

What happens to Chinese searches etc?

It might be nice to make a dictionary (for that proper spell checker library you use) based upon recent searches, or other domain knowledge (such as an index of content on the site).  Obviously this dictionary needs maintaining, and old search terms expired periodically etc.
i like i
Wednesday, March 30, 2005
I wonder if Markov chains could be applied against the problem.  Create the initial chain using a dictionary to determine the odds of one phonetic token following another.  Then do a breadth-first walk and present the results in order of likelihood.
Tim Clemons Send private email
Wednesday, March 30, 2005
Why does everyone keep suggesting phonetics?  How is this superior to a spell checker?

Also, note that Google will offer the alternative even if you spelled the search terms correctly.  Search for MSSQL:
Brian Send private email
Wednesday, March 30, 2005
If I were asking the question, I'd be focusing both on the back-end process (how you're going to do this, are you focusing on common typing mistakes or phonetics, are you keeping track of previous searches by other people) and the front-end (how do you present the information, do you give an easy way to override or implement the suggested change, how much power do you give to the user versus what do you handle behind the scenes).

You seem to have focused solely on the back-end issues.
Wednesday, March 30, 2005

Maybe keep track of previous searches, then whether the user immediately entered a different search without examining the results.  This could be combined with the soundex to only track searches within a certain degree of similarity (or that contain a certain threshold of common letters).

At any rate, I think your answer was OK for an interview, they probably don't know the answer, but have this problem.
Dave C Send private email
Wednesday, March 30, 2005
Well, maybe you could use the EDIT DISTANCE problem as a starter. I'm not entirely sure how to quickly compute (given an entire dictionary) the minimum edit distance to all valid words, but maybe a trie ( ) data structure would help. I'm not even going to begin to think about this, however, this problem seems somewhat similar so I'll just throw this out there and maybe it will or won't help.

On second thought, I was going to do a write up of the problem, but a quick Google search returned a nice link, so I'll save myself the hassle:

This would still leave out the problem where Google returns "Did you mean MYSQL?" for a query "MSSQL" (i.e. did they mean something else even though they had a valid dictionary word query) -- though personally I think that's a bug. Maybe combining this technique along with something like mentioned above dealing with past searches you could display the "Did you mean MYSQL?" only if MYSQL is searched for significantly more times than MSSQL.

Does that sound like a beginning to something that could work?

Ian Sefferman Send private email
Wednesday, March 30, 2005
I agree with Dave C .
J Send private email
Wednesday, March 30, 2005
Google uses its 'all user's searches history' and provides the closest match to the search string you actually used that has results.

That's probably an affirmation too strong but its known that all search terms are stored and the number of hits with them.  Once you have that information its trivial to present the neighbouring search term and make it seem like magic.
Simon Lucy Send private email
Thursday, March 31, 2005
Moron: thanks for the point of view.  I expect the hiring company wants an answer that operates like the "paperclip and notepad" auto assistant that is (used to be?) in MS Word.  It's use is not controled by the user, it's a device to keep the user pointed in the right direction and offer helpful hints based on the expectation the user has a less than ideal mastery of the (in this case) corporate data warehouse.  These users are trained in advance of first system use, but the training shouldn't stop there.  Anyway, that was my understanding of what was said to me.
Peter Hutchinson Send private email
Thursday, March 31, 2005
You typed "It's use". Did you mean "Its use?"
JOS automatic spell checker
Thursday, March 31, 2005
Why yes, exactly.  :)
Peter Hutchinson Send private email
Thursday, March 31, 2005
my idea is best :)

use a process that keeps the content of all the possible search fields indexed, then find the closest match of the search term in the indexed content using some pretty basic algorithm to find matching/similar words.

This way you (a) make sure that the alternative you offer is actually relevant to the contents of the database they are searching on (b) have all the content indexed so searching is way quick and (c) dont need to mess about with complex algorithms.

I wrote a spell checker module once that is still in use by various clients with largish numbers of employees...if uses a very basic matching algorithm to offer up alternatives by simply removing spaces, switching letters like a and e that people usually mess up and then comparing the results with word and making up new ways to improve them.

After a few fixes for glaringly obvious holes found by the clients at the beginning, it does a remarkably good job of offering up reasonable and good alternatives....yet took me no more than 2 days work (16 hours) total to actually write.
Thursday, March 31, 2005
There are some nice suggestions here, but I still think that the Google method does more than simple spell checking/soundex comparisons.

After listening to a couple of talks from Google engineers I came to the assumption that one of the core components for their functionality is their clustering algorithm. It searches their large database for phrases with certain words ("such as", "like", ...) to build clusters of related words. This is fully automated and, depending on the subject matter, works surprisingly well.

Connect this with spell checking/soundex functionality and it is quite easy to come up with a suggestion of an alternative query that either corrects a common spelling error ("brittney spears") or replaces a correct term with one that is slightly different but still related (i.e., "mssql" vs "mysql").
Martin Dittus
Monday, April 04, 2005
Or to rephrase: to build a working "did you mean X" feature, you need a measure of context. Word clustering can be a means of providing such context.
Martin Dittus
Monday, April 04, 2005
At the risk of it being already answered:

1) Count the number of results for this search.
2) Doing an approximate search using likelihood rules: soundex, replacing common misplaced letters (eh/he), like()
3) If any of these searches have more (percentage based) hits, ask the user if that was what she means.

Of course, having blazingly fast results (ala Google) is a plus.

Friday, April 08, 2005
It has come to our attention that members of this forum discussed in detail technology pertaining to the "You typed X, did you mean Y?" message patented by our client Google Inc. ...
Your Friendly Neighbourhood Lawfirm
Sunday, April 10, 2005
Not all misspellings are phonetical or letter order errors, some, for example, are misused suffixes or prefixes (atheistical vs atheistic, defunctional vs dysfunctional). Good start is to lookt at to gain an idea where people usually err.

Also I would keep list of most common misspelled queries the users make and ideally their respective second queries as well. Usually in case of simple typos the get it right second time. That'd allow me to build better suggest list.
Sunday, April 10, 2005
A slashdot user just linked to this: -- "some of the misspellings detected by our spelling correction system for the query [ britney spears ]"
Martin Dittus
Sunday, April 10, 2005

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz