從不同概率向量中采樣的有效方法

Question

我正在尋找一種更有效的方法來從整數列表1：n中抽樣，多次，其中概率向量（也是長度n）每次都不同。 對於n = 10的20次試驗，我知道可以這樣做：

probs <- matrix(runif(200), nrow = 20)
answers <- numeric(20)
for(i in 1:20) answers[i] <- sample(10,1,prob=probs[i,])

但是，每次調用樣本10次只是為了得到一個數字，所以它可能不是最快的方式。 速度會有所幫助，因為代碼會這么做很多次。

非常感謝！

盧克

編輯：非常感謝Roman，他對基准測試的想法幫助我找到了一個很好的解決方案。 我現在把它轉到了答案。

Answer 1

只是為了好玩，我嘗試了兩個版本。 你在做這個抽樣的規模是多少？ 我認為所有這些都非常快，並且或多或少相當（我沒有為您的解決方案創建probs）。 很想看到別人對此有所了解。

library(rbenchmark)
benchmark(replications = 1000,
          luke = for(i in 1:20) answers[i] <- sample(10,1,prob=probs[i,]),
          roman = apply(probs, MARGIN = 1, FUN = function(x) sample(10, 1, prob = x)),
          roman2 = replicate(20, sample(10, 1, prob = runif(10))))

    test replications elapsed relative user.self sys.self user.child sys.child
1   luke         1000    0.41    1.000      0.42        0         NA        NA
2  roman         1000    0.47    1.146      0.46        0         NA        NA
3 roman2         1000    0.47    1.146      0.44        0         NA        NA

Answer 2

這是我找到的另一種方法。 它速度很快，但沒有像使用for循環多次調用樣本那么快。 我最初認為它非常好，但我錯誤地使用了基准（）。

luke2 = function(probs) { # takes a matrix of probability vectors, each in its own row
                probs <- probs/rowSums(probs) 
                probs <- t(apply(probs,1,cumsum)) 
                answer <- rowSums(probs - runif(nrow(probs)) < 0) + 1 
                return(answer)  }

以下是它的工作原理：將概率描述為從0到1的數字線上排列的各種長度的線。大概率的數字線路將占據數字線路的大部分。 然后，您可以通過在數字線上選擇一個隨機點來選擇結果 - 大概率將更有可能被選中。 這種方法的優點是你可以滾動一次runif（）調用所需的所有隨機數，而不是像函數luke，roman和roman2那樣反復調用樣本。 但是，看起來額外的數據處理速度會降低速度並且成本會抵消這一優勢。

library(rbenchmark)
probs <- matrix(runif(2000), ncol = 10)
answers <- numeric(200)

benchmark(replications = 1000,
          luke = for(i in 1:20) answers[i] <- sample(10,1,prob=probs[i,]),
          luke2 = luke2(probs),
          roman = apply(probs, MARGIN = 1, FUN = function(x) sample(10, 1, prob = x)),
          roman2 = replicate(20, sample(10, 1, prob = runif(10))))
              roman = apply(probs, MARGIN = 1, FUN = function(x) sample(10, 1, prob = x)),
              roman2 = replicate(20, sample(10, 1, prob = runif(10))))

    test replications elapsed relative user.self sys.self user.child sys.child
    1   luke         1000   0.171    1.000     0.166    0.005          0         0
    2  luke2         1000   0.529    3.094     0.518    0.012          0         0
    3  roman         1000   1.564    9.146     1.513    0.052          0         0
    4 roman2         1000   0.225    1.316     0.213    0.012          0         0

出於某種原因，當您添加更多行時，apply（）會非常糟糕。 我不明白為什么，因為我認為它是for（）的包裝器，因此roman（）應該與luke（）類似地執行。

從不同概率向量中采樣的有效方法

問題描述

2 個解決方案

解決方案1
2 2013-05-18 07:13:41

解決方案2
1 2013-05-20 06:36:15

從不同概率向量中采樣的有效方法

問題描述

2 個解決方案

解決方案1 2 2013-05-18 07:13:41

解決方案2 1 2013-05-20 06:36:15

解決方案1
2 2013-05-18 07:13:41

解決方案2
1 2013-05-20 06:36:15