按最小单元格大小的 R 样本

Question

set.seed(1)
data=data.frame(SCHOOL = rep(1:10, each = 1000), GRADE = sample(7:12, r = T, size = 10000),SCORE = sample(1:100, r = T, size = 10000))

I have 'data' that contains information about student test score.我有包含有关学生考试成绩信息的“数据”。 I wish to: count how many GRADE for each SCHOOL, and then take the smallest value of GRADE for all SCHOOLS.我希望：计算每个SCHOOL有多少个GRADE，然后取所有SCHOOLS的GRADE最小值。 Like this:像这样：

For each SCHOOL count the number of rows for a specific GRADE.对于每个 SCHOOL，计算特定 GRADE 的行数。 Then for each GRADE find the smallest values across all SCHOOLs.然后为每个 GRADE 找出所有 SCHOOL 的最小值。 Finally I wish to take a random sample based on the smallest value found in step 2.最后，我希望根据第 2 步中找到的最小值随机抽样。

So basically in this basic example with two SCHOOLs and GRADE 7 and GRADE 8:所以基本上在这个有两个 SCHOOL 和 GRADE 7 和 GRADE 8 的基本示例中：

SCHOOL 1 has 2 SCOREs for GRADE 7 and SCHOOL 1 has 3 SCOREs for GRADE 8. SCHOOL 1 的 7 年级有 2 个分数，SCHOOL 1 的 8 年级有 3 个分数。

SCHOOL 2 has 1 SCOREs for GRADE 7 and SCHOOL 2 has 4 SCOREs for GRADE 8. SCHOOL 2 的 7 年级有 1 分，SCHOOL 2 的 8 年级有 4 分。

So the new data contains one SCORE for GRADE 7 from SCHOOL 1 and SCHOOL 2, and three SCORE for GRADE 8 from SCHOOL 1 and SCHOOL 2 and these SCORE that are picked are RANDOMLY SAMPLED.因此，新数据包含来自 SCHOOL 1 和 SCHOOL 2 的一个 GRADE 7 SCORE，以及来自 SCHOOL 1 和 SCHOOL 2 的三个 GRADE 8 SCORE，并且这些 SCORE 是随机抽样的。

like this:像这样：

My attempt: data[, .SD[sample(x = .N, size = min(sum(GRADE), .N))], by = .(SCHOOL,GRADE]我的尝试： data[, .SD[sample(x = .N, size = min(sum(GRADE), .N))], by = .(SCHOOL,GRADE]

Answer 1

This follows your description of how to do it step-by-step.这遵循您对如何逐步执行此操作的描述。

library(data.table)
setDT(data)
data[, N := .N, .(SCHOOL, GRADE)]
data[, N := min(N), GRADE]
data[, .(SCORE = sample(SCORE, N)), .(SCHOOL, GRADE, N)][, -'N']

If you have multiple SCORE -like columns and you want keep the same rows from each then you can use .SD like in your attempt:如果您有多个类似SCORE的列并且您希望每个列都保留相同的行，那么您可以在尝试中使用.SD ：

data[, .SD[sample(.N, N)], .(SCHOOL, GRADE, N)][, -'N']

按最小单元格大小的 R 样本

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-12 19:41:40

按最小单元格大小的 R 样本

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-12 19:41:40

解决方案1
1 已采纳 2020-09-12 19:41:40