简体   繁体   English

R中的分层抽样或比例抽样

[英]stratified sampling or proportional sampling in R

I have a data set generated as follows: 我有一个生成的数据集,如下所示:

myData <- data.frame(a=1:N,b=round(rnorm(N),2),group=round(rnorm(N,4),0))

The data looks like as this 数据如下所示

在此处输入图片说明

I would like to generate a stratified sample set of myData with given sample size, ie, 50. The resulting sample set should follow the proportion allocation of the original data set in terms of "group". 我想使用给定的样本大小(即50)生成myData的分层样本集。所得样本集应遵循原始数据集按“组”的比例分配。 For instance, assume myData has 20 records belonging to group 4, then the resulting data set should have 50*20/200=5 records belonging to group 4. How to do that in R. 例如,假设myData有20个属于组4的记录,那么结果数据集应具有50*20/200=5属于组4的记录。如何在R中做到这一点。

You can use my stratified function , specifying a value < 1 as your proportion, like this: 您可以使用我的stratified函数 ,将值<1指定为您的比例,如下所示:

## Sample data. Seed for reproducibility 
set.seed(1)
N <- 50
myData <- data.frame(a=1:N,b=round(rnorm(N),2),group=round(rnorm(N,4),0))

## Taking the sample
out <- stratified(myData, "group", .3)
out
#     a     b group
# 17 17 -0.02     2
# 8   8  0.74     3
# 25 25  0.62     3
# 49 49 -0.11     3
# 4   4  1.60     3
# 26 26 -0.06     4
# 27 27 -0.16     4
# 7   7  0.49     4
# 12 12  0.39     4
# 40 40  0.76     4
# 32 32 -0.10     4
# 9   9  0.58     5
# 42 42 -0.25     5
# 43 43  0.70     5
# 37 37 -0.39     5
# 11 11  1.51     6

Compare the counts in the final group with what we would have expected. 将最后一组的计数与我们的预期进行比较。

round(table(myData$group) * .3)
# 
# 2 3 4 5 6 
# 1 4 6 4 1 
table(out$group)
# 
# 2 3 4 5 6 
# 1 4 6 4 1 

You can also easily take a fixed number of samples per group, like this: 您还可以轻松地每组固定数量的样本,如下所示:

stratified(myData, "group", 2)
#     a     b group
# 34 34 -0.05     2
# 17 17 -0.02     2
# 49 49 -0.11     3
# 22 22  0.78     3
# 12 12  0.39     4
# 7   7  0.49     4
# 18 18  0.94     5
# 33 33  0.39     5
# 45 45 -0.69     6
# 11 11  1.51     6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM