[英]R Sample By Minimum Cell Size
set.seed(1)
data=data.frame(SCHOOL = rep(1:10, each = 1000), GRADE = sample(7:12, r = T, size = 10000),SCORE = sample(1:100, r = T, size = 10000))
I have 'data' that contains information about student test score.我有包含有关学生考试成绩信息的“数据”。 I wish to: count how many GRADE for each SCHOOL, and then take the smallest value of GRADE for all SCHOOLS.
我希望:计算每个SCHOOL有多少个GRADE,然后取所有SCHOOLS的GRADE最小值。 Like this:
像这样:
For each SCHOOL count the number of rows for a specific GRADE.对于每个 SCHOOL,计算特定 GRADE 的行数。 Then for each GRADE find the smallest values across all SCHOOLs.
然后为每个 GRADE 找出所有 SCHOOL 的最小值。 Finally I wish to take a random sample based on the smallest value found in step 2.
最后,我希望根据第 2 步中找到的最小值随机抽样。
So basically in this basic example with two SCHOOLs and GRADE 7 and GRADE 8:所以基本上在这个有两个 SCHOOL 和 GRADE 7 和 GRADE 8 的基本示例中:
SCHOOL 1 has 2 SCOREs for GRADE 7 and SCHOOL 1 has 3 SCOREs for GRADE 8. SCHOOL 1 的 7 年级有 2 个分数,SCHOOL 1 的 8 年级有 3 个分数。
SCHOOL 2 has 1 SCOREs for GRADE 7 and SCHOOL 2 has 4 SCOREs for GRADE 8. SCHOOL 2 的 7 年级有 1 分,SCHOOL 2 的 8 年级有 4 分。
So the new data contains one SCORE for GRADE 7 from SCHOOL 1 and SCHOOL 2, and three SCORE for GRADE 8 from SCHOOL 1 and SCHOOL 2 and these SCORE that are picked are RANDOMLY SAMPLED.因此,新数据包含来自 SCHOOL 1 和 SCHOOL 2 的一个 GRADE 7 SCORE,以及来自 SCHOOL 1 和 SCHOOL 2 的三个 GRADE 8 SCORE,并且这些 SCORE 是随机抽样的。
like this:像这样:
My attempt: data[, .SD[sample(x = .N, size = min(sum(GRADE), .N))], by = .(SCHOOL,GRADE]我的尝试: data[, .SD[sample(x = .N, size = min(sum(GRADE), .N))], by = .(SCHOOL,GRADE]
This follows your description of how to do it step-by-step.这遵循您对如何逐步执行此操作的描述。
library(data.table)
setDT(data)
data[, N := .N, .(SCHOOL, GRADE)]
data[, N := min(N), GRADE]
data[, .(SCORE = sample(SCORE, N)), .(SCHOOL, GRADE, N)][, -'N']
If you have multiple SCORE
-like columns and you want keep the same rows from each then you can use .SD
like in your attempt:如果您有多个类似
SCORE
的列并且您希望每个列都保留相同的行,那么您可以在尝试中使用.SD
:
data[, .SD[sample(.N, N)], .(SCHOOL, GRADE, N)][, -'N']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.