I have a large data set that I want to modify to look 'similar' to another dataset in proportions.
So target data set has proportions for variable X like this
'A' = 0.5,
'B'= 0.2,
'C'= 0.1
'D'= 0.2
And I want a group variable to be 2:1 ratio so that the data is for every trt there are 2 ctrl
My data looks like this:
ID GRP X Y
1 ctrl A 2
2 ctrl A 2
3 ctrl B 1
4 trt A 4
etc
I can make it into equal groups of X and GRP with this code:
DF%>% group_by(X, GRP) %>%sample_n(2500)
But I would like to get a 2:1 ratio for GRP and preserve that initial ratio of X. Is there a way to specify the percentage of the total group by strata in random sampling?
Something like this? (I used replace = TRUE
because of the small dataset)
library(dplyr)
DF%>%
group_by(X, GRP) %>%
mutate(prob = case_when(X == "A" ~ 0.5,
X == "B" ~ 0.2,
X == "C" ~ 0.1,
X == "D" ~ 0.2),
prob = if_else(GRP == "ctrl", prob * 2, prob)) %>%
distinct(X, GRP, Y, .keep_all = TRUE) %>%
ungroup() %>%
slice_sample(n = 10000, replace = TRUE, weight_by = prob) %>%
count(GRP, X)
# A tibble: 3 x 3
GRP X n
<chr> <chr> <int>
1 ctrl A 5203
2 ctrl B 2102
3 trt A 2695
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.