randomly ordering across groups (not within group) in data.table

Question

Let's say I want to order the iris dataset (as a data.table ) by Species, keeping observations grouped by species and randomly ordering across species.

How do I do that?

I am not talking about generating a random order within groups (species).

My intuition was to write the code bellow. But it actually creates the within species random variable. Well at least it makes the question reproducible

d <- iris %>% data.table
set.seed('12345')
d[,g:=runif(.N),Species]

Answer 1

You may do a binary search in i . A smaller example:

d <- data.table(Species = rep(letters[1:4], each = 2), ri = 1:8)
set.seed(1)
d[.(sample(unique(Species))), on = "Species"]
#    Species ri
# 1:       b  3
# 2:       b  4
# 3:       d  7
# 4:       d  8
# 5:       c  5
# 6:       c  6
# 7:       a  1
# 8:       a  2

Answer 2

We can randomly sample from a series 1...N where N is the # of levels of the factor ( Species ) in question.

We then map the new order to a column and sort by it. Broken apart into steps for illustration it looks like this:

tmp      <- sample_n(as.data.frame(seq(1,length(unique(d$Species)))),3)[,1]
d$index  <- tmp[as.numeric(d$Species)]
d        <- d[order(d$index),]

You could compact this into 1 line/step:

d <- d[order(sample_n(as.data.frame(seq(1,length(unique(d$Species)))),3)[,1][as.numeric(d$Species)]),]

Answer 3

Alternatively you could do:

e <- d[, .N, Species]
e[, g2 := runif(.N)]
d <- e[, .(Species, g2)][d, on = 'Species']

randomly ordering across groups (not within group) in data.table

Question

3 answers

solution1
2 2018-07-05 21:13:33

solution2
1 2018-07-05 19:37:49

solution3
1 ACCPTED 2018-07-05 19:38:11

randomly ordering across groups (not within group) in data.table

Question

3 answers

solution1 2 2018-07-05 21:13:33

solution2 1 2018-07-05 19:37:49

solution3 1 ACCPTED 2018-07-05 19:38:11

solution1
2 2018-07-05 21:13:33

solution2
1 2018-07-05 19:37:49

solution3
1 ACCPTED 2018-07-05 19:38:11