[英]drawing a stratified sample in R
Designing my stratified sample设计我的分层样本
library(survey)
design <- svydesign(id=~1,strata=~Category, data=billa, fpc=~fpc)
So far so good, but how can I draw now a sample in the same way I was able for simple sampling?到目前为止一切顺利,但是我现在如何以与进行简单采样相同的方式绘制样本?
set.seed(67359)
samplerows <- sort(sample(x=1:N, size=n.pre$n))
If you have a stratified design, then I believe you can sample randomly within each stratum.如果您有分层设计,那么我相信您可以在每个层内随机抽样。 Here is a short algorithm to do proportional sampling in each stratum, using
ddply
:这是使用
ddply
在每个层中进行比例采样的简短算法:
library(plyr)
set.seed(1)
dat <- data.frame(
id = 1:100,
Category = sample(LETTERS[1:3], 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
)
sampleOne <- function(id, fraction=0.1){
sort(sample(id, round(length(id)*fraction)))
}
ddply(dat, .(Category), summarize, sampleID=sampleOne(id, fraction=0.2))
Category sampleID
1 A 21
2 A 29
3 A 72
4 B 13
5 B 20
6 B 42
7 B 58
8 B 82
9 B 100
10 C 1
11 C 11
12 C 14
13 C 33
14 C 38
15 C 40
16 C 63
17 C 64
18 C 71
19 C 92
Take a look at the sampling
package on CRAN ( pdf here ), and the strata
function in particular.看看 CRAN 上的
sampling
包(此处为 pdf ),特别是strata
函数。
This is a good package to know if you're doing surveys;这是一个很好的软件包,可以了解您是否正在进行调查; there are several vignettes available from its page on CRAN .
在 CRAN 上的页面上有几个小插曲。
The task view on "Official Statistics" includes several topics that are closely related to these issues of survey design and sampling - browsing through it and the packages recommended may also introduce other tools that you can use in your work. “官方统计”的任务视图包括与这些调查设计和抽样问题密切相关的几个主题 - 浏览它和推荐的包还可能介绍您可以在工作中使用的其他工具。
You can draw a stratified sample using dplyr
.您可以使用
dplyr
绘制分层样本。 First we group by the column or columns in which we are interested in. In our example, 3 records of each Species.首先我们按我们感兴趣的一列或多列分组。在我们的例子中,每个物种有 3 条记录。
library(dplyr)
set.seed(1)
iris %>%
group_by (Species) %>%
sample_n(., 3)
Output:输出:
Source: local data frame [9 x 5]
Groups: Species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 5.7 3.8 1.7 0.3 setosa
3 5.2 3.5 1.5 0.2 setosa
4 5.7 3.0 4.2 1.2 versicolor
5 5.2 2.7 3.9 1.4 versicolor
6 5.0 2.3 3.3 1.0 versicolor
7 6.5 3.0 5.2 2.0 virginica
8 6.4 2.8 5.6 2.2 virginica
9 7.4 2.8 6.1 1.9 virginica
here's a quick way to sample three records per distinct 'carb' value from the mtcars data frame without replacement这是一种无需替换即可从 mtcars 数据框中每个不同“carb”值采样三个记录的快速方法
# choose how many records to sample per unique 'carb' value
records.per.carb.value <- 3
# draw the sample
your.sample <-
mtcars[
unlist(
tapply(
1:nrow( mtcars ) ,
mtcars$carb ,
sample ,
records.per.carb.value
)
) , ]
# print the results to the screen
your.sample
note that the survey
package is mostly used for analyzing complex sample survey data, not creating it.请注意,
survey
包主要用于分析复杂的样本调查数据,而不是创建它。 @Iterator is right that you should check out the sampling
package for more advanced ways to create complex sample survey data. @Iterator 是正确的,您应该查看
sampling
包以获取更高级的方法来创建复杂的样本调查数据。 :) :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.