简体   繁体   中英

How to pick 4 unique items from a weighted list?

So I've got a list of weighted items, and I'd like to pick 4 non-duplicate items from this list.

Item     Weight
Apple     5
Banana    7
Cherry    12
...
Orange    8
Pineapple 50

What is the most efficient way to do this? My initial foray was to just reroll for the subsequent picks if an already picked item came up... but for a small list this can result in a ton of rerolls.

Edit for clarification: For the above example, and ignoring fruits D through N, there is a total weight of 82. So the chances of being picked first are: A ~6% B ~8.5% C ~14.6% O ~9.8% P ~61% Once an items is picked the probabilities would (should!) change.

In your comment you say that unique means:

I don't want to pick the same item twice.

.. and that the weights determine a likelihood of being picked.

All you need to do to make sure that you don't pick duplicates, is simply remove the last picked item from the list before you pick the next one. Yes, this will change your weights slightly, but that is the correct statistical change to make if you do want unique results.


In addition, I'm not sure how you are using the weights to determine the candidates, but I came up with this algorithm that should do this with a minimal number of loops (and without the need to fill an array according to weights, which could result in extremely large arrays, requires int weights, etc.)

I've used JavaScript here, simply so it's easy to see the output in a browser without a server. It should be trivial to port to PHP since it's not doing anything complicated.

Constants

var FRUITS = [
    {name : "Apple", weight: 8 },
    {name : "Orange", weight: 4 },
    {name : "Banana", weight: 4 },
    {name : "Nectarine", weight: 3 },
    {name : "Kiwi", weight: 1 }
];

var PICKS = 3;

function getNewFruitsAvailable(fruits, removeFruit) {
    var newFruits = [];
    for (var idx in fruits) {
        if (fruits[idx].name != removeFruit) {
            newFruits.push(fruits[idx]);
        }
    }
    return newFruits;
}

Script

var results = [];
var candidateFruits = FRUITS;

for (var i=0; i < PICKS; i++) {
    // CALCULATE TOTAL WEIGHT OF AVAILABLE FRUITS
    var totalweight = 0;
    for (var idx in candidateFruits) {
        totalweight += candidateFruits[idx].weight;
    }
    console.log("Total weight: " + totalweight);

    var rand = Math.random();

    console.log("Random: " + rand);

    // ITERATE THROUGH FRUITS AND PICK THE ONE THAT MATCHES THE RANDOM
    var weightinc = 0;
    for (idx in candidateFruits) {
        // INCREMENT THE WEIGHT BY THE NEXT FRUIT'S WEIGHT
        var candidate = candidateFruits[idx];
        weightinc += candidate.weight;

        // IF rand IS BETWEEN LAST WEIGHT AND NEXT WEIGHT, PICK THIS FRUIT
        if (rand < weightinc/totalweight) {
            results.push(candidate.name);
            console.log("Pick: " + candidate.name);

            // GET NEXT SET OF FRUITS (REMOVING PICKED FRUIT)
            candidateFruits = getNewFruitsAvailable(candidateFruits, candidate.name);
            break;
        }
    }
    console.log("CandidateFruits: " + candidateFruits.length);
};

Output

for (var i=0; i < results.length; i++) {
    document.write(results[i] + "<br/>");
}

The basic strategy is to allocate each fruit a portion of the total range [0,1) . In the first loop, you'd have this:

  • Apple — 8/20 = 0.0 up to 0.4
  • Orange — 4/20 = 0.4 up to 0.6
  • Banana — 4/20 = 0.6 up to 0.8
  • Nectarine — 3/20 = 0.8 up to 0.95
  • Kiwi — 8/20 = 0.95 up to 1.0

The script iterates over each item in the list, and progresses a weight counter. When it reaches the range that contains the first random, it picks that item, removes it from the list, then recalculates the ranges based on the new total weight and runs again.

Update

function array_rand2($ary,$n = 1)
{
  // make sure we don't get in to an infinite loop
  // check we have enough options to select from
  $unique = count(array_unique(array_keys($ary)));
  if ($n > $unique) $n = count($unique);

  // First, explode the array and expand out all the weights
  // this means something with a weight of 5 will appear in
  // in the array 5 times
  $_ary = array();
  foreach ($ary as $item => $weight)
  {
    $_ary = array_merge($_ary, array_fill(0, $weight, $item));
  }

  // now look for $n unique entries
  $matches = array();
  while (count($matches) < $n)
  {
    $r = $_ary[array_rand($_ary)];
    if (!in_array($r,$matches))
    {
      $matches[] = $r;
    }
  }

  // and now grab those $n entries and return them
  $result = array();
  foreach ($matches as $match){
    $result[] = $match;
  }
  return $result;
}

See if that does a better job.

Here I found the idea to following steps:

  1. Build the sum of the weights --> SUM
  2. Build a random number between 0 and SUM --> RAND_NUMBER
  3. Iterate through the list and subtract each element weight from RAND_NUMBER. If RAND_NUMBER gets negative, you have your first element.
  4. Remove the found element from the list and go back to step 1 until you have 4 elements.

Maybe instead of "rerolls" you could just increment the list element index you've randomly generated: list.elementAt(rand_index++ % size(list)) (something like that). I'd think you'd find the next random unique item pretty fast with logic like that.

I'm sure there are even better solutions, of course, there usually are.

Edit: Looks like Brad's already provided one.. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM