The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Displaying Random Content Based on Relative Frequency

I am developing a system to display "random" snippets of information from a variety of sources (back-end databases, RSS feeds, simple CMS'd news articles, etc.) and I want to "weight" the content so, for example, Company performance figures (tonnages produced, tonnages despatched, etc) are weighted higher and appear more often than "fillers" like items from an RSS news feed.

The content items are held in a "playlist" SQL Server table and have a numeric "weight" column; a value from 0 to 32767. I planned to make 0 weighted content override all other content so, while there are 0 weighted content records in the playlist table, only they will be displayed and all other records ignored until such time as the 0 weighted records are removed from the table.

I have a recordset (I'm using classic ASP - don't let that put you off!) containing all the appropriate content records in the playlist that I want to "randomly" select from.

Each record in the recordset has the aforementioned "weight" column.

Apart from the exception for 0 weighted columns my "plan" was to first iterate through all the records adding their weight column value to produce a total "weighting" value. Then produce a random number from 0 to this total value. Finally, to re-iterate the recordset keeping a "running sum" of weight values until the random number is less than or equal to the running sum in which case I've found my "weighted random" record, which I can then display.

Whilst this works, it just doesn't seem particularly elegant to me?

So, finally, my question...

Is there a more elegant way to pick a random record from a list of weighted records so that a record with a weight of 100 (for example) will generally be picked twice as often as a record with a weight of 50? I'd guesstimate there'll be about 200 records to pick from if that makes a difference on how best to approach this problem.

Thanks for your time.

Charley
Charley Farley
Thursday, October 26, 2006
 
 
Some pseudo code for a possible solution:

answer = first item
total_weight = weight(first item)
for each remaining item
{
total_weight += weight(current item)
if (random() * total_weight < weight(current item))
{
answer = current item
}
}
return answer

Where 0 <= random() < 1

Note that this will run through the whole list, even if it ultimately chooses the first item.
DAH
Thursday, October 26, 2006
 
 
Sorry, I missed the special behaviour for when the weight is 0.
DAH
Thursday, October 26, 2006
 
 
The following pseudocode seems more 'elegant' to me.

maxWeight = 32767
justTakeItCounter = 32
snippet = ""
if ( I don't have any 0-weight items ):
  N = number of choices
  loop:
      selectIdx = random( 0, N )
      justTakeItCounter -= 1
      selectedWeight = item[selectIdx].weight
      if ( justTakeItCounter == 0 ) OR ( random( 0, maxWeight ) <= selectedWeight ):
        snippet = item[selectIdx].snippet
        break
print snippet

The solution you describe (which I've used in the past), exhibits (I think, without detailed analysis) works in O(N log N) performance. However, the solution proposed here shows random performance; it might return a value on the first test, or on the 10-millionth. Adding the justTakeItCounter termination test is a pragmatic hack to limit the execution time that will tend to bias the result away from the low-weight items (i.e., they will appear in less than the proportion specified by the weight); the lower the initial justTakeItCounter value, the more the bias but the shorter the maximum execution time.
Samuel Reynolds Send private email
Thursday, October 26, 2006
 
 
Talking about 200 records, simply go with your plan, it will work just fine. Except that you should produce a random number from 1 to the total sum, including both, not from 0. Then you have a correct distribution according to the weightings, automagically ignoring any 0-weighting, even on the first item.
Secure
Thursday, October 26, 2006
 
 
Except that it was meant the other way around with the 0-items, not to be ignored, but to be taken exclusively... ;)
Secure
Thursday, October 26, 2006
 
 
Thanks very much for you comments.

The 0 override is a non-issue because this is checked before the randomiser is used by filtering the recordset for weight=0 rows. If there are any, then they are passed to an "equal-opportunities" randomiser that just picks a record from 1 to numrecords (because they all have the same weight=0), otherwise the unfiltered records are passed to the "weighted-randomiser".

Using my original method appears to be working well. I'm lucky I'm only picking from a small set of records.

Anyway, thanks to all of you for taking the time to offer your suggestions. It's much appreciated.

Charley
Charley Farley
Friday, October 27, 2006
 
 
I've used your method before with success.  If the content and weightings aren't changing often, you could build and cache a table containing the weighted count of elements and then randomly pick an index for that table.

In rough pseudo-code:
lookup = A, B, B, B, C, D, D, ...
current = lookup[GetRandomInt(0, lookup.Length)]
SomeBody
Friday, October 27, 2006
 
 
If you're going to do a cached lookup strategy, this works better (in Java, but you get the idea):

  // This Map is pre-populated from somewhere.

  Map<Object, Double> frequencyMap;
  int itemCount = frequencyMap.size();

  // Create a pair of arrays to use as a lookup table.

  Object[] lookupItems = new Object[itemCount];
  double[] lookupFreqs = new double[itemCount];

  // Iterate through the items in the Map, populating
  // the lookup arrays with Objects, as well as with
  // a running-frequency-total for each item.

  double runningTotal = 0;
  int insertCount = 0;
  // Order of iteration is irrelevant
  for (Object item : frequencyMap.keySet()) {
    double frequency = frequencyMap.get(item);
    runningTotal += frequency ;
    lookupFreqs[insertCount] = runningTotal;
    lookupItems[insertCount] = item;
    insertCount++;
  }

  // Now get a random number between zero and the
  // aggregate of all probabilities.

  Random r = new Random(System.nanoTime());
  double random = r.nextDouble();
  double lookup = random * runningTotal;

  // Perform a binary search into the lookup array
  // to find the item with the given aggregate
  // frequency.

  int index = Arrays.binarySearch(lookupFreqs, lookup);
  index = (index >= 0 ? index : -index - 1);

  // Choose the item at the selected index
  Object probabilisticallyRandomItem = lookupItems[index];

I've used a variant of this code before to perform a probabilistically accurate choosing algorithm between tens of thousands of different objects. It's very handy. And the binary search makes it very very fast.

Enjoy!!
BenjiSmith Send private email
Saturday, October 28, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz