在 R 中绘制分层样本

Question

Designing my stratified sample设计我的分层样本

library(survey)
design <- svydesign(id=~1,strata=~Category,  data=billa, fpc=~fpc)

So far so good, but how can I draw now a sample in the same way I was able for simple sampling?到目前为止一切顺利，但是我现在如何以与进行简单采样相同的方式绘制样本？

set.seed(67359)  
samplerows <- sort(sample(x=1:N, size=n.pre$n))

Answer 1

If you have a stratified design, then I believe you can sample randomly within each stratum.如果您有分层设计，那么我相信您可以在每个层内随机抽样。 Here is a short algorithm to do proportional sampling in each stratum, using ddply :这是使用ddply在每个层中进行比例采样的简短算法：

library(plyr)
set.seed(1)
dat <- data.frame(
    id = 1:100,
    Category = sample(LETTERS[1:3], 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
)

sampleOne <- function(id, fraction=0.1){
  sort(sample(id, round(length(id)*fraction)))
}

ddply(dat, .(Category), summarize, sampleID=sampleOne(id, fraction=0.2))

   Category sampleID
1         A       21
2         A       29
3         A       72
4         B       13
5         B       20
6         B       42
7         B       58
8         B       82
9         B      100
10        C        1
11        C       11
12        C       14
13        C       33
14        C       38
15        C       40
16        C       63
17        C       64
18        C       71
19        C       92

Answer 2

Take a look at the sampling package on CRAN ( pdf here ), and the strata function in particular.看看 CRAN 上的sampling包（此处为 pdf ），特别是strata函数。

This is a good package to know if you're doing surveys;这是一个很好的软件包，可以了解您是否正在进行调查； there are several vignettes available from its page on CRAN . 在 CRAN 上的页面上有几个小插曲。

The task view on "Official Statistics" includes several topics that are closely related to these issues of survey design and sampling - browsing through it and the packages recommended may also introduce other tools that you can use in your work. “官方统计”的任务视图包括与这些调查设计和抽样问题密切相关的几个主题 - 浏览它和推荐的包还可能介绍您可以在工作中使用的其他工具。

Answer 3

You can draw a stratified sample using dplyr .您可以使用dplyr绘制分层样本。 First we group by the column or columns in which we are interested in. In our example, 3 records of each Species.首先我们按我们感兴趣的一列或多列分组。在我们的例子中，每个物种有 3 条记录。

library(dplyr)
set.seed(1)
iris %>%
  group_by (Species) %>%
  sample_n(., 3)

Output:输出：

Source: local data frame [9 x 5]
Groups: Species

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          4.3         3.0          1.1         0.1     setosa
2          5.7         3.8          1.7         0.3     setosa
3          5.2         3.5          1.5         0.2     setosa
4          5.7         3.0          4.2         1.2 versicolor
5          5.2         2.7          3.9         1.4 versicolor
6          5.0         2.3          3.3         1.0 versicolor
7          6.5         3.0          5.2         2.0  virginica
8          6.4         2.8          5.6         2.2  virginica
9          7.4         2.8          6.1         1.9  virginica

Answer 4

here's a quick way to sample three records per distinct 'carb' value from the mtcars data frame without replacement这是一种无需替换即可从 mtcars 数据框中每个不同“carb”值采样三个记录的快速方法

# choose how many records to sample per unique 'carb' value
records.per.carb.value <- 3

# draw the sample
your.sample <- 
    mtcars[ 
        unlist( 
            tapply( 
                1:nrow( mtcars ) , 
                mtcars$carb , 
                sample , 
                records.per.carb.value 
            ) 
        ) , ]

# print the results to the screen
your.sample

note that the survey package is mostly used for analyzing complex sample survey data, not creating it.请注意， survey包主要用于分析复杂的样本调查数据，而不是创建它。 @Iterator is right that you should check out the sampling package for more advanced ways to create complex sample survey data. @Iterator 是正确的，您应该查看sampling包以获取更高级的方法来创建复杂的样本调查数据。 :) :)

在 R 中绘制分层样本

问题描述

4 个解决方案

解决方案1
4 已采纳 2011-10-31 07:31:09

解决方案2
4 2011-11-03 14:52:30

解决方案3
3 2015-07-05 09:02:18

解决方案4
2 2012-12-19 13:53:23

在 R 中绘制分层样本

问题描述

4 个解决方案

解决方案1 4 已采纳 2011-10-31 07:31:09

解决方案2 4 2011-11-03 14:52:30

解决方案3 3 2015-07-05 09:02:18

解决方案4 2 2012-12-19 13:53:23

解决方案1
4 已采纳 2011-10-31 07:31:09

解决方案2
4 2011-11-03 14:52:30

解决方案3
3 2015-07-05 09:02:18

解决方案4
2 2012-12-19 13:53:23