[英]Stratified random sampling in R
I am struggling to create a stratified sample of size 100 using stratified random sampling with 3078 observations. 我正在努力创建一个100级的分层样本,使用分层随机抽样和3078个观测值。 The conditions the stratified random sampling have to meet are : FARMS92<100, between 100 and 300, between 300 and 600, and FARMS92>600 as strata, and by using proportional allocation.
分层随机抽样必须满足的条件是:FARMS92 <100,介于100和300之间,介于300和600之间,FARMS92> 600作为分层,并使用比例分配。
I do not understand how to proceed when i follow the stratified function : https://gist.github.com/mrdwab/6424112 当我遵循分层功能时,我不明白如何继续: https : //gist.github.com/mrdwab/6424112
Here is my dataset: 这是我的数据集:
COUNTY STATE ACRES92 ACRES87 FARMS92
1 ALEUTIAN AK 683533 726596 764514
2 ANCHORAGE AK 47146 59297 256709
3 FAIRBANKS AK 141338 154913 204568
4 JUNEAU AK 210 214 127
5 KENAI AK 50810 85712 98035
6 AUTAUGA AL 107259 116050 145044
7 BALDWIN AL 167832 192082 223502
8 BARBOUR AL 177189 207906 222066
9 BIBB AL 48022 50818 49630
10 BLOUNT AL 137426 140107 163638
11 BULLOCK AL 144799 156332 185304
12 BUTLER AL 96427 99997 124491
13 CALHOUN AL 73841 90474 93248
14 CHAMBERS AL 109555 102153 121101
15 CHEROKEE AL 121504 119956 143656
Could you please explain me the steps on how to proceed? 你能解释一下如何进行的步骤吗?
You could first separate into bins eg <100, between 100 and 300, etc using the cut
function. 您可以首先使用
cut
功能分隔成容器,例如<100,介于100和300之间等。
data$cut <- cut(data$FARMS92, breaks = c(0,100,300,600, 1E7), labels = c("A","B","C", "D"), right = TRUE)
Then use the stratify
function ( https://gist.github.com/mrdwab/6424112 ). 然后使用
stratify
函数( https://gist.github.com/mrdwab/6424112 )。
stratified(data, "cut", size = c(2,2,2,2))
For this particular example I used size = c(2,2,2,2)
that will return 2 from each bin. 对于这个特定的例子,我使用
size = c(2,2,2,2)
,它将从每个bin返回2。 Since you want a sample size = 100 then adjust the size accordingly. 由于您希望样本大小= 100,因此请相应调整大小。 For instance, for proportional allocation you could use for your original dataset something like:
size = round(100 * prop.table(table(data$cut)), 0)
. 例如,对于比例分配,您可以将原始数据集用于:
size = round(100 * prop.table(table(data$cut)), 0)
。
Output: 输出:
COUNTY STATE ACRES92 ACRES87 FARMS92 cut
7 BALDWIN AL 167832 192082 22 A
6 AUTAUGA AL 107259 116050 14 A
4 JUNEAU AK 210 214 127 B
12 BUTLER AL 96427 99997 124 B
11 BULLOCK AL 144799 156332 385 C
15 CHEROKEE AL 121504 119956 436 C
9 BIBB AL 48022 50818 49630 D
8 BARBOUR AL 177189 207906 222066 D
I modified your dataset to produce a better working example. 我修改了您的数据集以生成更好的工作示例。 Data:
数据:
data <- read.table(text= "COUNTY STATE ACRES92 ACRES87 FARMS92
1 ALEUTIAN AK 683533 726596 76
2 ANCHORAGE AK 47146 59297 2
3 FAIRBANKS AK 141338 154913 204
4 JUNEAU AK 210 214 127
5 KENAI AK 50810 85712 480
6 AUTAUGA AL 107259 116050 14
7 BALDWIN AL 167832 192082 22
8 BARBOUR AL 177189 207906 222066
9 BIBB AL 48022 50818 49630
10 BLOUNT AL 137426 140107 163638
11 BULLOCK AL 144799 156332 385
12 BUTLER AL 96427 99997 124
13 CALHOUN AL 73841 90474 93248
14 CHAMBERS AL 109555 102153 121
15 CHEROKEE AL 121504 119956 436 ", stringsAsFactors=FALSE, header = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.