简体   繁体   中英

R: sample with different sample sizes for groups

I have a data frame with 2 grouping columns V1 and V2. I want to sample exactly n = 4 elements for each distinct value in V1 and make sure that a minimum of m = 1 of each distinct element in V2 is sampled.

library(tidyverse)
set.seed(1)
df = data.frame(
  V1 = c(rep("A",6), rep("B",6)),
  V2 = c("C","C","D","D","E","E","F","F","G","G","H","H"),
  V3 = rnorm(12)
)

df
   V1 V2         V3
1   A  C -0.6264538
2   A  C  0.1836433
3   A  D -0.8356286
4   A  D  1.5952808
5   A  E  0.3295078
6   A  E -0.8204684
7   B  F  0.4874291
8   B  F  0.7383247
9   B  G  0.5757814
10  B  G -0.3053884
11  B  H  1.5117812
12  B  H  0.3898432

My desired output is for example...

V1    V2        V3
1 A     C     -0.626
2 A     D     -0.836
3 A     E     -0.820
4 A     E      0.329
5 B     F      0.487
6 B     G      0.576
7 B     G     -0.305
8 B     H      0.390

I do not know how to generate this output. When I group by V1 and V2 I get n = 3 elements for each distinct value in V1.

df %>%
  group_by(V1,V2) %>%
  sample_n(1)

  V1    V2        V3
1 A     C     -0.626
2 A     D     -0.836
3 A     E     -0.820
4 B     F      0.487
5 B     G      0.576
6 B     H      0.390

The "splitstackshape" or "sampling" packages did not help.

Here is one approach:

library(dplyr)

nr <- 4
first_pass <- df %>% group_by(V1, V2) %>% sample_n(1) %>% ungroup

first_pass %>% 
  count(V1) %>% 
  mutate(n = nr - n) %>%
  left_join(df, by = 'V1') %>%
  group_by(V1) %>%
  sample_n(first(n)) %>%
  select(-n) %>%
  bind_rows(first_pass) %>%
  arrange(V1, V2)

#  V1    V2        V3
#  <chr> <chr>  <dbl>
#1 A     C      0.184
#2 A     D     -0.836
#3 A     E     -0.820
#4 A     E     -0.820
#5 B     F      0.487
#6 B     F      0.738
#7 B     G     -0.305
#8 B     H      0.390

The logic is to first randomly select 1 row for each V1 and V2 . We then calculate for each V1 how many more rows do we need to get nr rows and sample them randomly from each V1 and combine the final dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM