Monday, February 20, 2012

Weighted Random Selection

I recently came across an interesting problem. Given an array of items where each item has a name and a weight (integer value), select a random item from the array based on the weight. So items with a larger weight value are more likely to be returned.

One solution I thought of was to create a new array where the weight determines how many times an element appears in the new array. You then just pick an item from that array at random. However this is limited due to the alarming size of the new array when the weights and the number of items in the original increase.

So here is my solution to the problem - there are many ways of solving this problem but I think this one is easy to understand.

Essentially the algorithm is
1. Add up all the weights.
2. Pick a number at random between 1 and the sum of the weights.
3. Iterate over the items, decrementing the random number by the weight of the current selection.
4. Compare the result to zero, if less than or equal to break otherwise keep iterating.

The key is that the larger the weight the more likely to be less than zero when compared to the random selection between zero and the sum of weights. Remember - that random decrements each pass through the array of items, so you will always hit zero i.e. it is guaranteed that something will be picked every time.

I will use PHP for this (now my primary language day-in-day-out) so I'm sure there are improvements to be made in this code - I'm very new to PHP still!

In this example say the array contains 4 foods and each has a weight. This code demonstrates the distribution over a 1000 iterations.

// We don't like fruit in this example, so give a low weighting
$foods = array('vegetables'=>4, 'fruit'=>1, 'chips'=>8, 'burgers'=>10);
$sumOfWeights = array_sum($foods);

// hold the results of a thousand random selections. 
$count = array('vegetables'=>0, 'fruit'=>0, 'chips'=>0, 'burgers'=>0);

for($i=0; $i < 1000; $i++) {
    // choose a random between 1 and the sum of the weights.
    $random = rand(1, $sumOfWeights); 
    foreach($foods as $name => $weighting) {
        // ***The next two lines are the heart of this algorithm***
        // decrement the random by the current weighting.
        $random -= $weighting;        
        // The larger the weighting, the more likely random is less than zero.
        if($random <= 0) {
            $count[$name]++; 
            break;  
        }
    }
}

foreach($count as $name => $result) {
   echo($name."=".$result);
}


This will give results similar to
  • vegetables=210
  • fruit=157
  • chips=278
  • burgers=355

Changing the weighting value for each item will yield the expected distributions, for example increasing the weighting for vegetables to 80 and fruit to 60 (let's be healthy) would result in something similar to
  • vegetables=553
  • fruit=421
  • chips=19
  • burgers=7

3 comments:

  1. Obviously, if you normalize the numbers you can make the array any size you want.

    ReplyDelete
  2. Hi,

    Sorry, I'm no techie at all, but this seems like it's exactly what I'm looking for, but, how do I run it?

    Thank you,

    Rob..

    ReplyDelete