從 R 中的分組數據中選擇 n 個隨機組

Question

我對由聚集在 160 所學校內的學生組成的數據進行了分組。 我想從該數據集中抽取 30 所學校的隨機樣本。 我硬編碼了一個解決方案（見下文），但是在 R 中是否有包裝函數或更快捷的方法來做到這一點？ 有點像 sample_n() 或 top_n()，但它們每組返回 n 個觀察值，而我想要來自 n 個組的 100% 的觀察值。

# First, some example data. Each row represents one student in a given school, and that student's favourite fruit.

df <- tribble(
    ~school_id, ~favourite_fruit,
    #----------#---------------
    1, "apple",
    1, "banana",
    2, "kiwi",
    2, "tomato",
    3, "strawberry",
    3, "cherry",
    4, "orange",
    4, "lime"
)

# My hard-coded solution

school_vector <- df %>% 
    group_by(school_id) %>% 
    select(school_id) %>% 
    count() %>% 
    ungroup() %>% 
    select(school_id) %>% 
    sample_n(2)

df_subset <- df %>% 
    filter(school_id %in% school_vector$school_id) %>% 
    as_tibble()

Answer 1

您可以在filter創建一個school_id樣本， school_id其與您當前的%in%邏輯一起使用

df %>% 
  filter(school_id %in% sample(unique(school_id), 2))
# # A tibble: 4 x 2
#   school_id favourite_fruit
#       <dbl> <chr>          
# 1         3 strawberry     
# 2         3 cherry         
# 3         4 orange         
# 4         4 lime

作為一個函數：

group_samp <- function(df, group_var, n){
  df %>% 
    filter({{group_var}} %in% sample(unique({{group_var}}), n))
}

df %>% 
  group_samp(school_id, 2)
# # A tibble: 4 x 2
#   school_id favourite_fruit
#       <dbl> <chr>          
# 1         1 apple          
# 2         1 banana         
# 3         2 kiwi           
# 4         2 tomato

從 R 中的分組數據中選擇 n 個隨機組

問題描述

1 個解決方案

解決方案1
3 已采納 2020-02-21 22:27:12

從 R 中的分組數據中選擇 n 個隨機組

問題描述

1 個解決方案

解決方案1 3 已采納 2020-02-21 22:27:12

解決方案1
3 已采納 2020-02-21 22:27:12