简体   繁体   中英

randomly ordering across groups (not within group) in data.table

Let's say I want to order the iris dataset (as a data.table ) by Species, keeping observations grouped by species and randomly ordering across species.

How do I do that?

I am not talking about generating a random order within groups (species).

My intuition was to write the code bellow. But it actually creates the within species random variable. Well at least it makes the question reproducible

d <- iris %>% data.table
set.seed('12345')
d[,g:=runif(.N),Species]

You may do a binary search in i . A smaller example:

d <- data.table(Species = rep(letters[1:4], each = 2), ri = 1:8)
set.seed(1)
d[.(sample(unique(Species))), on = "Species"]
#    Species ri
# 1:       b  3
# 2:       b  4
# 3:       d  7
# 4:       d  8
# 5:       c  5
# 6:       c  6
# 7:       a  1
# 8:       a  2

We can randomly sample from a series 1...N where N is the # of levels of the factor ( Species ) in question.

We then map the new order to a column and sort by it. Broken apart into steps for illustration it looks like this:

tmp      <- sample_n(as.data.frame(seq(1,length(unique(d$Species)))),3)[,1]
d$index  <- tmp[as.numeric(d$Species)]
d        <- d[order(d$index),]

You could compact this into 1 line/step:

d <- d[order(sample_n(as.data.frame(seq(1,length(unique(d$Species)))),3)[,1][as.numeric(d$Species)]),]

Alternatively you could do:

e <- d[, .N, Species]
e[, g2 := runif(.N)]
d <- e[, .(Species, g2)][d, on = 'Species']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM