Attempting to estimate the expected number of dice rolls needed to obtain all possible sums of two dice when I roll 2 6 sided fair dice

Question

So I am doing a sample exam question in preparation for my stats exam and I have hit a dead end.

The question is asking:

If you roll two 6-sided fair dice until you get all possible outcomes (ie all sums 2-12 have occurred at least once). Estimate the expected number of dice rolls needed.

This question needs to be answered using a simulation study in R.

So far I have simulated two dice being rolled and have also obtained the sum of each roll. I am unsure how to modify my code to check for expected number of rolls needed to get each sum at least once

My code so far:

d <- data.frame(a=sample(1:6, 1000000, replace=TRUE), 
                b=sample(1:6, 1000000, replace=TRUE)) 
d$sum <- d$a + d$b 
hist(d$sum)

Any help would be great:))

Answer 1

We can sample rolling a single die 10 times with the code:

sample(6, 10, TRUE)

If we want to sample two dice, we can use replicate on this code:

replicate(2, sample(6, 10, TRUE))
#>       [,1] [,2]
#>  [1,]    1    1
#>  [2,]    4    5
#>  [3,]    1    5
#>  [4,]    2    2
#>  [5,]    5    6
#>  [6,]    3    6
#>  [7,]    6    2
#>  [8,]    2    1
#>  [9,]    3    5
#> [10,]    3    5

So we can find the row sums of this matrix to get the sums from 10 rolls of 2 dice using rowSums :

rowSums(replicate(2, sample(6, 10, TRUE)))
#> [1]  2  9  6  4 11  9  8  3  8  8

Now supposing that we simulate 1,000 rolls of two dice in exactly the same way and call the output throws .

throws <- rowSums(replicate(2, sample(6, 1000, TRUE)))

It is almost certain we will have all of the values 2 - 12 in throws , but we can test it out:

length(unique(throws))
#> [1] 11

But we can also see that our first 11 throws were not enough to get all 11 different values:

length(unique(throws[1:11]))
#> [1] 10

What if we look at the first 100 throws?

length(unique(throws[1:100]))
#> [1] 11

So we know that somewhere between 11 and 100 throws were required. Now if we iterate through these throws, then we will find the first point where the number of unique throws was 11:

  for(i in 11:100)
  {
    if(length(unique(throws[1:i])) == 11) break;
  }

i
#> [1] 23

Our loop stopped when i was 23, meaning that it took 23 throws to get all 11 unique sums from our two dice.

We can wrap all this logic in a little function:

sim <- function() {
  throws <- rowSums(replicate(2, sample(6, 1000, TRUE)))
  for(i in 11:1000)
  {
    if(length(unique(throws[1:i])) == 11) break;
  }
  return(i)
}

And we will see we get a different number each time:

sim()
#> [1] 29
sim()
#> [1] 94
sim()
#> [1] 62

If we want a feel for the distribution of results of sim , we need to put a bunch of its results in a vector. Again, we can use replicate here:

vec <- replicate(1000, sim())

Now we can see the mean number of throws required:

mean(vec)
#> [1] 59.821

And the median

median(vec)
#> [1] 51

And a histogram:

hist(vec)

Or a density plot:

plot(density(vec))

Answer 2

I would like to add a second answer to this question, with additional information meant to complement Allan's answer.

The question calls for a Monte Carlo method: if it's too hard to calculate the distribution of the output of a process, you can run it stochastically a number of times and calculate the average over all the runs. The more precise you want your estimation to be, the more runs you do.

Allan gives an excellent description of the summary statistics, but I would like to propose an improved sim() function to use instead. I don't know r, so I'll provide it in pseudo-code.

function roll:
    return an int in the range [1, 6] sampled with the uniform distribution

function sim:
    let s = an empty set
    let i = 0
    while size(s) < 11, do:
        let n = roll() + roll()
        add n to s
        i += 1
    return i

The code follows the process in the question. Since s is a set, its size counts unique results, so its size equals 11 as soon as all results have been obtained.

Addendum

The above pseudocode implemented in R would be:

roll <- function() sample(6, 1)

sim <- function() {
  s <- numeric()
  i <- 0
  while(length(s) < 11) {
    n <- roll() + roll()
    if(!n %in% s) s <- c(s, n)
    i <- i + 1
  }
  return(i)
}

n_sims <- function(n) sapply(seq(n), function(x) sim())

So, for example, to run the experiment 10 times we would do:

n_sims(10)
#>  [1]  55  54  31  45  51 118  61  44  63  29

^{Created on 2022-11-23 with reprex v2.0.2}

Attempting to estimate the expected number of dice rolls needed to obtain all possible sums of two dice when I roll 2 6 sided fair dice

Question

2 answers

solution1
4 2022-11-22 14:44:52

solution2
3 2022-11-23 21:34:16

Attempting to estimate the expected number of dice rolls needed to obtain all possible sums of two dice when I roll 2 6 sided fair dice

Question

2 answers

solution1 4 2022-11-22 14:44:52

solution2 3 2022-11-23 21:34:16

solution1
4 2022-11-22 14:44:52

solution2
3 2022-11-23 21:34:16