简体   繁体   English

当我掷 2 个 6 面公平骰子时,试图估计获得两个骰子所有可能总和所需的预期掷骰数

[英]Attempting to estimate the expected number of dice rolls needed to obtain all possible sums of two dice when I roll 2 6 sided fair dice

So I am doing a sample exam question in preparation for my stats exam and I have hit a dead end.所以我正在做一个样题来准备我的统计考试,但我已经走到了死胡同。

The question is asking:问题是问:

If you roll two 6-sided fair dice until you get all possible outcomes (ie all sums 2-12 have occurred at least once).如果您掷两个 6 面公平骰子,直到获得所有可能的结果(即所有 2-12 的总和至少出现一次)。 Estimate the expected number of dice rolls needed.估计所需的预期掷骰数。

This question needs to be answered using a simulation study in R.这个问题需要使用 R 中的模拟研究来回答。

So far I have simulated two dice being rolled and have also obtained the sum of each roll.到目前为止,我已经模拟了掷两个骰子的过程,并且还获得了每次掷骰的总和。 I am unsure how to modify my code to check for expected number of rolls needed to get each sum at least once我不确定如何修改我的代码以检查至少一次获得每个总和所需的预期卷数

My code so far:到目前为止我的代码:

d <- data.frame(a=sample(1:6, 1000000, replace=TRUE), 
                b=sample(1:6, 1000000, replace=TRUE)) 
d$sum <- d$a + d$b 
hist(d$sum)

Any help would be great:))任何帮助都会很棒:))

We can sample rolling a single die 10 times with the code:我们可以使用代码对单个骰子滚动 10 次进行采样:

sample(6, 10, TRUE)

If we want to sample two dice, we can use replicate on this code:如果我们想采样两个骰子,我们可以在这段代码上使用replicate

replicate(2, sample(6, 10, TRUE))
#>       [,1] [,2]
#>  [1,]    1    1
#>  [2,]    4    5
#>  [3,]    1    5
#>  [4,]    2    2
#>  [5,]    5    6
#>  [6,]    3    6
#>  [7,]    6    2
#>  [8,]    2    1
#>  [9,]    3    5
#> [10,]    3    5

So we can find the row sums of this matrix to get the sums from 10 rolls of 2 dice using rowSums :因此,我们可以找到该矩阵的行总和,以使用rowSums从 10 卷 2 个骰子中获得总和:

rowSums(replicate(2, sample(6, 10, TRUE)))
#> [1]  2  9  6  4 11  9  8  3  8  8

Now supposing that we simulate 1,000 rolls of two dice in exactly the same way and call the output throws .现在假设我们以完全相同的方式模拟 1,000 次掷两个骰子并调用 output throws

throws <- rowSums(replicate(2, sample(6, 1000, TRUE)))

It is almost certain we will have all of the values 2 - 12 in throws , but we can test it out:几乎可以肯定我们将在throws中得到所有 2 - 12 的值,但我们可以对其进行测试:

length(unique(throws))
#> [1] 11

But we can also see that our first 11 throws were not enough to get all 11 different values:但我们也可以看到,我们的前 11 次抛出不足以获得所有 11 个不同的值:

length(unique(throws[1:11]))
#> [1] 10

What if we look at the first 100 throws?如果我们看一下前 100 次投掷呢?

length(unique(throws[1:100]))
#> [1] 11

So we know that somewhere between 11 and 100 throws were required.所以我们知道需要 11 到 100 次投掷。 Now if we iterate through these throws, then we will find the first point where the number of unique throws was 11:现在,如果我们遍历这些投掷,那么我们将找到唯一投掷数为 11 的第一个点:

  for(i in 11:100)
  {
    if(length(unique(throws[1:i])) == 11) break;
  }

i
#> [1] 23

Our loop stopped when i was 23, meaning that it took 23 throws to get all 11 unique sums from our two dice.i 23 岁时,我们的循环停止了,这意味着我们需要掷 23 次才能从我们的两个骰子中获得所有 11 个唯一的总和。

We can wrap all this logic in a little function:我们可以将所有这些逻辑包装在一个小的 function 中:

sim <- function() {
  throws <- rowSums(replicate(2, sample(6, 1000, TRUE)))
  for(i in 11:1000)
  {
    if(length(unique(throws[1:i])) == 11) break;
  }
  return(i)
}

And we will see we get a different number each time:我们会看到每次都得到不同的数字:

sim()
#> [1] 29
sim()
#> [1] 94
sim()
#> [1] 62

If we want a feel for the distribution of results of sim , we need to put a bunch of its results in a vector.如果我们想要感受一下sim的结果分布,我们需要将它的一堆结果放在一个向量中。 Again, we can use replicate here:同样,我们可以在这里使用replicate

vec <- replicate(1000, sim())

Now we can see the mean number of throws required:现在我们可以看到所需的平均投掷次数:

mean(vec)
#> [1] 59.821

And the median和中位数

median(vec)
#> [1] 51

And a histogram:和直方图:

hist(vec)

在此处输入图像描述

Or a density plot:或密度 plot:

plot(density(vec))

在此处输入图像描述

I would like to add a second answer to this question, with additional information meant to complement Allan's answer.我想为这个问题添加第二个答案,并提供更多信息来补充 Allan 的答案。

The question calls for a Monte Carlo method: if it's too hard to calculate the distribution of the output of a process, you can run it stochastically a number of times and calculate the average over all the runs.这个问题需要一个蒙特卡洛方法:如果计算一个进程的 output 的分布太难了,你可以随机运行它多次并计算所有运行的平均值。 The more precise you want your estimation to be, the more runs you do.您希望您的估计越精确,您运行的次数就越多。

Allan gives an excellent description of the summary statistics, but I would like to propose an improved sim() function to use instead. Allan 对汇总统计信息进行了很好的描述,但我想建议使用改进的sim() function 来代替。 I don't know r, so I'll provide it in pseudo-code.我不知道 r,所以我会用伪代码提供它。

function roll:
    return an int in the range [1, 6] sampled with the uniform distribution

function sim:
    let s = an empty set
    let i = 0
    while size(s) < 11, do:
        let n = roll() + roll()
        add n to s
        i += 1
    return i

The code follows the process in the question.代码遵循问题中的过程。 Since s is a set, its size counts unique results, so its size equals 11 as soon as all results have been obtained.由于s是一个集合,它的大小计算唯一结果,所以一旦获得所有结果,它的大小就等于 11。


Addendum附录

The above pseudocode implemented in R would be:上面在 R 中实现的伪代码是:

roll <- function() sample(6, 1)

sim <- function() {
  s <- numeric()
  i <- 0
  while(length(s) < 11) {
    n <- roll() + roll()
    if(!n %in% s) s <- c(s, n)
    i <- i + 1
  }
  return(i)
}

n_sims <- function(n) sapply(seq(n), function(x) sim())

So, for example, to run the experiment 10 times we would do:因此,例如,要运行实验 10 次,我们会这样做:

n_sims(10)
#>  [1]  55  54  31  45  51 118  61  44  63  29

Created on 2022-11-23 with reprex v2.0.2创建于 2022-11-23,使用reprex v2.0.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM