簡體   English   中英

別名方法的有效版本/替代方案,無需替換即可進行采樣

[英]An efficient version/alternative to the alias method that samples without replacement

我正在用HTML5 / JS編寫一個測驗/教學游戲,其中玩家可以從更大的精通集中獲得一組10個問題。 游戲會隨着時間的推移跟蹤玩家的分數,並且更有可能從問題列表中選擇玩家遇到問題的問題。

為了構造概率分布列表,我采用如下的別名方法 ,在完全遵守分布的同時在O(1)時間內選擇項目:

function generate_random_question_selector() {
    // Generates a random selector function using the Alias Method
    // for discrete probability distributions (see
    // https://en.wikipedia.org/wiki/Alias_method for an explanation)
    var i = 0;
    var probabilities = [], aliases = [];
    var probSum = 0;

    /* ... Business logic to fill probabilities array ... */

    // Normalize all probabilities to average to 1
    // and categorize each probability as to where it fits
    // in that scale
    var probMultiplier = probabilities.length / probSum;
    var overFull = [], underFull = [];
    probabilities = probabilities.map(function(p, i) {
        var newP = p * probMultiplier;
        if (newP > 1) overFull.push(i);
        else if (newP < 1) underFull.push(i);
        else if (newP !== 1) {
            throw "Non-numerical value got into scores";
        }
        return newP;
    });
    overFull.sort();
    underFull.sort();

    // Process both queues by having each under-full entry
    // have the rest of its space occupied by the fullest
    // over-full entry, re-categorizing the over-full entry
    // as needed
    while (overFull.length > 0 || underFull.length > 0) {
        if (!(overFull.length > 0 && underFull.length > 0)) {
            // only reached due to rounding errors.
            // Just assign all the remaining probabilities to 1
            var notEmptyArray = overFull.length > 0 ? overFull : underFull;
            notEmptyArray.forEach(function(index) {
                probabilities[index] = 1;
            });
            break; // get out of the while loop
        }

        aliases[underFull[0]] = overFull[0];
        probabilities[overFull[0]] += probabilities[underFull[0]] - 1;
        underFull.shift();
        if (probabilities[overFull[0]] > 1) overFull.push(overFull.shift());
        else if (probabilities[overFull[0]] < 1) underFull.push(overFull.shift());
        else overFull.shift();
    }

    return function() {
        var index = Math.floor(Math.random() * probabilities.length);
        return Math.random() < probabilities[index] ? index : aliases[index];
    }
}

這種方法效果很好,但我的業務規范的一部分是問題不重復。 我目前使用天真的重卷技術來實現這一目標,但很明顯,如果少於10個項目比其他項目更可能,這將會中斷:

var selectQuestion = generate_random_question_selector();   
var questionSet = [];
for (var i = 0; i < num_questions; i++) {
    var question_num;
    do {
        question_num = selectQuestion();
    } while (questionSet.indexOf(question_num) >= 0)
    questionSet.push(question_num);
}

對於這種方法可以采取哪些措施或使其能夠有效地對問題進行抽樣而無需替換?

別名方法不適合於無需替換的采樣,因為每個值使用不同的概率分布進行采樣,並且計算(或更新)別名表為O(n)。

您需要一個可以更有效地更新的數據結構。 例如,您可以構建所有值的搜索樹(其中每個節點存儲其子樹的總權重),這將允許采樣並更新O(log n)中的概率分布。

如果我們通過將其概率設置為0來刪除條目,則此樹永遠不會在結構上進行修改,並且可以編碼為數組。

這是一些代碼:

function prepare() {
    // index i is the parent of indices 2*i and 2*i+1
    // therefore, index 0 is unused, and index 1 the root of the tree
    var i;
    for (i = weights.length - 1; i > 1; i--) {
        weights[i >> 1] += weights[i];
    }
}

function sample() {
    var index = 1;
    var key = Math.random() * weights[index];

    for (;;) {
        var left = index << 1;
        var right = left + 1;
        leftWeight = weights[left] || 0;
        rightWeight = weights[right] || 0;

        if (key < leftWeight) {
            index = left;
        } else {
            key -= leftWeight;
            if (key < rightWeight) {
                index = right;
            } else {
                return index;
            }
        }
    }
}

function remove(index) {
    var left = index << 1;
    var right = left + 1;
    leftWeight = weights[left] || 0;
    rightWeight = weights[right] || 0;

    var w = weights[index] - leftWeight - rightWeight;
    while (index > 0) {
        weights[index] -= w;
        index = index >> 1;
    }
}

測試代碼:

function retrieve() {
    var index = sample();
    remove(index);
    console.log(index);
    console.log(weights);
}

weights = [0,1,2,3,4];
prepare();
console.log(weights);
retrieve();
retrieve();
retrieve();
retrieve();

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM