简体   繁体   English

R中的分层随机抽样

[英]Stratified random sampling in R

I am struggling to create a stratified sample of size 100 using stratified random sampling with 3078 observations. 我正在努力创建一个100级的分层样本,使用分层随机抽样和3078个观测值。 The conditions the stratified random sampling have to meet are : FARMS92<100, between 100 and 300, between 300 and 600, and FARMS92>600 as strata, and by using proportional allocation. 分层随机抽样必须满足的条件是:FARMS92 <100,介于100和300之间,介于300和600之间,FARMS92> 600作为分层,并使用比例分配。

I do not understand how to proceed when i follow the stratified function : https://gist.github.com/mrdwab/6424112 当我遵循分层功能时,我不明白如何继续: https//gist.github.com/mrdwab/6424112

Here is my dataset: 这是我的数据集:

        COUNTY   STATE  ACRES92 ACRES87 FARMS92
    1   ALEUTIAN  AK    683533  726596  764514
    2   ANCHORAGE AK    47146   59297   256709
    3   FAIRBANKS AK    141338  154913  204568
    4   JUNEAU    AK    210     214     127
    5   KENAI     AK    50810   85712   98035
    6   AUTAUGA   AL    107259  116050  145044
    7   BALDWIN   AL    167832  192082  223502
    8   BARBOUR   AL    177189  207906  222066
    9   BIBB      AL    48022   50818   49630
    10  BLOUNT    AL    137426  140107  163638
    11  BULLOCK   AL    144799  156332  185304
    12  BUTLER    AL    96427   99997   124491
    13  CALHOUN   AL    73841   90474   93248
    14  CHAMBERS  AL    109555  102153  121101
    15  CHEROKEE  AL    121504  119956  143656 

Could you please explain me the steps on how to proceed? 你能解释一下如何进行的步骤吗?

You could first separate into bins eg <100, between 100 and 300, etc using the cut function. 您可以首先使用cut功能分隔成容器,例如<100,介于100和300之间等。

data$cut <- cut(data$FARMS92, breaks = c(0,100,300,600, 1E7), labels = c("A","B","C", "D"), right = TRUE)

Then use the stratify function ( https://gist.github.com/mrdwab/6424112 ). 然后使用stratify函数( https://gist.github.com/mrdwab/6424112 )。

stratified(data, "cut", size = c(2,2,2,2))

For this particular example I used size = c(2,2,2,2) that will return 2 from each bin. 对于这个特定的例子,我使用size = c(2,2,2,2) ,它将从每个bin返回2。 Since you want a sample size = 100 then adjust the size accordingly. 由于您希望样本大小= 100,因此请相应调整大小。 For instance, for proportional allocation you could use for your original dataset something like: size = round(100 * prop.table(table(data$cut)), 0) . 例如,对于比例分配,您可以将原始数据集用于: size = round(100 * prop.table(table(data$cut)), 0)

Output: 输出:

     COUNTY STATE ACRES92 ACRES87 FARMS92 cut
7   BALDWIN    AL  167832  192082      22   A
6   AUTAUGA    AL  107259  116050      14   A
4    JUNEAU    AK     210     214     127   B
12   BUTLER    AL   96427   99997     124   B
11  BULLOCK    AL  144799  156332     385   C
15 CHEROKEE    AL  121504  119956     436   C
9      BIBB    AL   48022   50818   49630   D
8   BARBOUR    AL  177189  207906  222066   D

I modified your dataset to produce a better working example. 我修改了您的数据集以生成更好的工作示例。 Data: 数据:

data <- read.table(text= "COUNTY   STATE  ACRES92 ACRES87 FARMS92
1   ALEUTIAN  AK    683533  726596  76
2   ANCHORAGE AK    47146   59297   2
3   FAIRBANKS AK    141338  154913  204
4   JUNEAU    AK    210     214     127
5   KENAI     AK    50810   85712   480
6   AUTAUGA   AL    107259  116050  14
7   BALDWIN   AL    167832  192082  22
8   BARBOUR   AL    177189  207906  222066
9   BIBB      AL    48022   50818   49630
10  BLOUNT    AL    137426  140107  163638
11  BULLOCK   AL    144799  156332  385
12  BUTLER    AL    96427   99997   124
13  CALHOUN   AL    73841   90474   93248
14  CHAMBERS  AL    109555  102153  121
15  CHEROKEE  AL    121504  119956  436 ", stringsAsFactors=FALSE, header = TRUE)   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM