简体   繁体   中英

Random selection by 2 layer strata in r

I have a large data set that I want to modify to look 'similar' to another dataset in proportions.

So target data set has proportions for variable X like this

'A' = 0.5,
'B'= 0.2,
'C'= 0.1
'D'= 0.2

And I want a group variable to be 2:1 ratio so that the data is for every trt there are 2 ctrl

My data looks like this:

 ID          GRP         X         Y
 1           ctrl         A        2
 2           ctrl         A        2
 3           ctrl         B        1
 4           trt          A        4

etc

I can make it into equal groups of X and GRP with this code:

DF%>% group_by(X, GRP) %>%sample_n(2500) 

But I would like to get a 2:1 ratio for GRP and preserve that initial ratio of X. Is there a way to specify the percentage of the total group by strata in random sampling?

Something like this? (I used replace = TRUE because of the small dataset)

library(dplyr)
DF%>% 
  group_by(X, GRP) %>%
  mutate(prob = case_when(X == "A" ~ 0.5,
                          X == "B" ~ 0.2,
                          X == "C" ~ 0.1,
                          X == "D" ~ 0.2),
         prob = if_else(GRP == "ctrl", prob * 2, prob)) %>% 
  distinct(X, GRP, Y, .keep_all =  TRUE) %>% 
  ungroup() %>% 
  slice_sample(n = 10000, replace = TRUE, weight_by = prob) %>% 
  count(GRP, X) 

# A tibble: 3 x 3
  GRP   X         n
  <chr> <chr> <int>
1 ctrl  A      5203
2 ctrl  B      2102
3 trt   A      2695

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM