## The Joel on Software Discussion Group (CLOSED)A place to discuss Joel on Software. Now closed. |
||

This community works best when people use their real names. Please
register for a free account.
Other Groups: Joel on Software Business of Software Design of Software (CLOSED) .NET Questions (CLOSED) TechInterview.org CityDesk FogBugz Fog Creek Copilot The Old Forum Your hosts:
Albert D. Kallal Li-Fan Chen Stephen Jones |
In other words, I'd like to be able to Google for something that solves the following problem.
nobody Tuesday, May 29, 2007
Okay that got submitted a little too soon (anyone ever had the Enter key stop working in Firefox???)
Say I have a set of Data objects I'd like to update (e.g. Da, Db, and Dc). And a set of Operations that can satisfy those updates (e.g. O1, O2, and O3). Lets say that each Operations could satisfy one or more Data objects (e.g. O1 returns Da and Db, O2 returns Db and Dv). In additions, lets assume that we know the cost/weight of each operation. How can I determine the set of operations with the smallest total cost/weight that satisfies a given set of Data objects? Any thoughts or points to existing examples would be helpful! Thanks
If you have a small, finite number of possibilities then you can just test them all.
Otherwise, perhaps (I'm not certain that I understood your problem description) the problem domain sounds something like http://en.wikipedia.org/wiki/Combinatorial_optimization
I'm not sure I understand your requirements completely. The way it is described, it sounds like any combination of operations and data objects will result in the same total cost. I'm sure it's more complicated than that, though.
This sounds like a typical linear optimization problem. Problems of this sort are typically handled using Linear Programming [1] to find an optimal value for what's called the objective function. There are a number of algorithms for solving these problems including the venerable Branch and Bound Algorithm [2] and the more recent Simplex Method [3] If you are willing to accept results that are good, but not necessarily optimal, there are other methods that might apply such as Simulated Annealing [4] or Karmarkar's method [5], among others. If your answers have to come out as discrete integers, then you'll want to look at the Integer Programming variation of Linear Programming. [1] http://en.wikipedia.org/wiki/Linear_programming [2] http://en.wikipedia.org/wiki/Branch_and_bound [3] http://en.wikipedia.org/wiki/Simplex_method [4] http://en.wikipedia.org/wiki/Simulated_annealing [5] http://en.wikipedia.org/wiki/Karmarkar_algorithm
Thanks for the responses.
It is okay to bring back more data. Some operations will return more pieces of data than others (some might be big "bulk" operations, where others may be fine grained and return exactly one data point). Let me give a realistic example. In one case one big operation that is expensive may return all of the required data more efficiently than several small operations (possibly because it has been optimized for that exact case). In another case, there may be several small operations that are cheaper than one big one (maybe the big one brings back lots of data when only a few pieces are needed). In another case, maybe the answer is one big operation and one small operation. Again, to begin with, for each operation, you know its cost and the set of data points is satisfies. The input to the algorithm is a set of data points. The output should be a set of operations that satisfies each of the input data points.
The people who suggest optimization are correct. There are approximations. You could just think about how you would solve the problem... maybe pick a big operation first? maybe start small and build up, but if you got too many start over big? And get an ad hoc algorithm that way. But, if you have a lot of inputs and outputs and want the 'correct' solution, optimzation is the way to go.
There was a nice series on ibm developer works on glpk the gnu solver once. http://www-128.ibm.com/developerworks/linux/library/l-glpk1/ Your particular problem requires an array of costs for the operations, a set of target outputs, and a 2d array mapping operations to targets. You tell the solver you require that for each output the Sum of the the output the operations it selects needs to meet your target. You also ask it to minimize costs. Say go... and it solves it. Gives you a decision variable array telling you how many of each operation to execute. Compile glpk and and run the below through "glpsol -m" The specific encoding will differ according to what solver you use, but this is the idea. set OPERATIONS; set OUTPUTS; /* Parameters */ param ProductionTable {i in OPERATIONS, j in OUTPUTS}; param Targets {i in OUTPUTS}; param Cost_operations{i in OPERATIONS}; /* Decision variables */ var x {i in OPERATIONS} >=0; /* Objective function */ minimize z: sum{i in OPERATIONS} Cost_operations[i]*x[i]; /* Constraints */ s.t. const{j in OUTPUTS} : sum{i in OPERATIONS} ProductionTable[i,j]*x[i] >= Targets[j]; data; /* data section */ set OPERATIONS := op1 op2 op3 op4; set OUTPUTS:= t1 t2 t3 t4 t5; param Cost_operations := op1 1 op2 5 op3 1 op4 1; param Targets := t1 1 t2 1 t3 0 t4 1 t5 1; param ProductionTable: t1 t2 t3 t4 t5:= op1 1 1 0 0 0 op2 0 1 1 1 1 op3 0 0 0 0 1 op4 1 0 1 0 0 ; end;
I could be crazy, but it sounds to me like what you're describing is almost an exact cover problem - except that instead of an exact cover you want a minimum cost cover.
http://en.wikipedia.org/wiki/Exact_cover (Begins with deep math, but read down further for application to common problems like Sudoku.) If this is correct, then I think the previous post has what you need. But if you want/need to roll your own solution, then you could use a slightly adapted version of Knuth's "Dancing Links" - it's a blazing fast solution to exact cover problems, and very elegant. I implemented it once and had a lot of fun with it. http://en.wikipedia.org/wiki/Dancing_Links
andy Tuesday, May 29, 2007
I'm not certain, but that sounds an awful lot like either the knapsack problem or travelling salesman, neither or which have good (polynomial time) solutions.
We have costs for operations (lengths of routes) that return multiple results (visit multiple locations) and we want to find the cheapest set of operations (shortest set of routes) that returns all results (visits all locations). That looks like an exact match for travelling salesman. If you can find a solution to that problem that completes in polynomial time, you can probably get an honorary doctorate from an ivy leage university, or maybe a Fields medal, which is to say: there is no known algorithm other than exhaustive search of all possible solutions. Now, you can write a program that tries solutions randomly and returns you the shortest one found (not necessarily the shortest possible), but that's not what you said you were looking for. See the following terms: "Monte Carlo", "Simulated Annealing" and "NP-Complete".
I don't think the Traveling Salesman problem applies well here, as (I assume) the cost of one operation does not depend on which operation came before it. That is, the order of operations doesn't matter so long as they collectively retrieve all the required data points.
Rather, we appear to have a set of operations and a mapping of that set to a set of data points. Then for a given subset of the data points we want to find a subset of operations that completely "covers" (covers the mappings to) the subset of data points. Hence I think this is a covering problem (and NP-complete). I suspect you could find an optimal solution with a non-deterministic method like Knuth's "algorithm X", but I'm not sure if you'd run afoul of the fact that an exact covering may not exist (in which case we settle for a complete covering of minimal cost). But it seems like it should be doable without statistical methods.
andy Wednesday, May 30, 2007
This is a weighted set cover problem (very last problem in cormen et al. It is NP-hard so you want an approximation algorithm. The greedy algorithm (take the next unsued op with min cost/(number of not yet covered results)) can apparently be shown to be no worse than ln(largest set size) times as bad as optimal.
Google for other options.
expr Wednesday, May 30, 2007 |

Powered by FogBugz