[英]Selecting n random groups from grouped data in R
我對由聚集在 160 所學校內的學生組成的數據進行了分組。 我想從該數據集中抽取 30 所學校的隨機樣本。 我硬編碼了一個解決方案(見下文),但是在 R 中是否有包裝函數或更快捷的方法來做到這一點? 有點像 sample_n() 或 top_n(),但它們每組返回 n 個觀察值,而我想要來自 n 個組的 100% 的觀察值。
# First, some example data. Each row represents one student in a given school, and that student's favourite fruit.
df <- tribble(
~school_id, ~favourite_fruit,
#----------#---------------
1, "apple",
1, "banana",
2, "kiwi",
2, "tomato",
3, "strawberry",
3, "cherry",
4, "orange",
4, "lime"
)
# My hard-coded solution
school_vector <- df %>%
group_by(school_id) %>%
select(school_id) %>%
count() %>%
ungroup() %>%
select(school_id) %>%
sample_n(2)
df_subset <- df %>%
filter(school_id %in% school_vector$school_id) %>%
as_tibble()
您可以在filter
創建一個school_id
樣本, school_id
其與您當前的%in%
邏輯一起使用
df %>%
filter(school_id %in% sample(unique(school_id), 2))
# # A tibble: 4 x 2
# school_id favourite_fruit
# <dbl> <chr>
# 1 3 strawberry
# 2 3 cherry
# 3 4 orange
# 4 4 lime
作為一個函數:
group_samp <- function(df, group_var, n){
df %>%
filter({{group_var}} %in% sample(unique({{group_var}}), n))
}
df %>%
group_samp(school_id, 2)
# # A tibble: 4 x 2
# school_id favourite_fruit
# <dbl> <chr>
# 1 1 apple
# 2 1 banana
# 3 2 kiwi
# 4 2 tomato
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.