The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

The MapReduce Hammer

A person with Hammer sees every problem as nail.

But when it comes to MapReduce, I find only one nail. The word count problem. Despite my best efforts, I haven't been able to find other problems for which MapReduce could be used. Something apart from Text mining.

Has anyone used it for anything else ? What kinds of problems are best solved by MR ? Please shed some light.

Btw, I came across disco http://discoproject.org , a MR framework from Nokia.
Sathya Send private email
Thursday, September 04, 2008
 
 
We used n-queens to test our cloud computing stuff at the tech incubator.  The more practical use scenario, which we never got to, was doing earthquake simulations for architectural firms.

Suppose your building needs to get rated up to earthquakes of X magnitude.  You have a simulator which, given a CAD model and four parameters of an earthquake, can tell you what damage the building sustains.  And, up unto magnitude X, you can come up with 800,000 plausible combinations of parameters to sample the possible earthquake space.

The thinking was that, since all of those simulations can be run in parallel, you can chop them among 1,000 nodes and get the answer in a day instead of using a supercomputer and needing two weeks.  (This would let you do Agile earthquake testing instead of only doing testing of the final design.)
Patrick McKenzie (Bingo Card Creator) Send private email
Thursday, September 04, 2008
 
 
Patrick:

How would you use MapReduce for those examples?  They sound like regular distributed computations.
Brian McKeever Send private email
Thursday, September 04, 2008
 
 
n-queens is essentially word count, except instead of partitioning the document and counting words you're partitioning the space of possible boards and counting legal placements.  While it is a standard distributed computing problem, it is quite amenable to Map/Reduce as well.

The other one I'm less sure of the specifics on, since I never actually had to implement it.
Patrick McKenzie (Bingo Card Creator) Send private email
Friday, September 05, 2008
 
 
N-Queens is not a good example since it is still a master-worker solution. The dancing links algorithm was designed for MPI and later implemented as map-reduce for a Hadoop example. However, it uses an identity reducer stage making it a normal master-worker task.
Benjamin Manes Send private email
Friday, September 05, 2008
 
 
Pardon my ignorance, but isn't Map/Reduce a way to solve distributed problems? I.e. isn't the Map/Reduce -> distributed computing relationship the same one as quicksort -> sorting ?
Joske Vermeulen
Friday, September 05, 2008
 
 
Joske:
Yes and no. Its not new distributed processing approach, as its either implemented in the traditional master-worker or supervisor-mesh model. I'd describe it more of a programming idiom that promotes writing highly scalable tasks. But that extra restrictiveness to simplify the programming model doesn't work for a wide range of tasks, which is why identity stages are so common to force it back into a master-worker programming model.

The best, imho, is a standard implementation with different facades to allow programming any of the common models (master-worker, fork-join, map-reduce) that allows chaining. This allow the cleanest representation of the task rather than coercing one model to act like another. This is how I went about it, so I'm fairly biased.
Benjamin Manes Send private email
Friday, September 05, 2008
 
 
I use the Map-Reduce approach in Poker Copilot to allow me to utilise multiple cores or processes. I only use the "Map" part of it.

The problem I am solving: there are thousands of files containing poker hand histories that I need to parse.

It goes like this:

1. A quick scan of the relevant parts of the file system finds aFileList, which is a list of files that need to be parsed.

2. The fileList and the file parsing algorithm are sent to Map-Reduce.

3. Map-Reduce decides how to divide the fileList between threads, based on the characteristics of the machine.

4. Map-Reduce returns a list of trees, one for each parsed file.

It works well for me. It seems to be a reliable and maintainable approach to dividing independent tasks amongst processors.
Steve McLeod Send private email
Saturday, September 06, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz