简体   繁体   English

R中具有固定比例观察类型的分层抽样

[英]stratified sampling with fixed proportions of observation types in R

I have a sample where 50% of the observations are White and 50% African-American. 我有一个样本,其中50%的观察结果是白人,而50%的非洲裔美国人。

I would like to obtain a random subsample where such proportion is modified to 80% White and 20% African-American. 我想获得一个随机子样本,其中该比例修改为80%的白人和20%的非洲裔美国人。

I have tried the command stratified but I could not find an option allowing me to allocate shares to the stratifying criterion. 我已经尝试将命令分层,但是找不到允许我将股份分配给分层标准的选项。

Thank you in advance for your help! 预先感谢您的帮助!

Well I'd filter the data for White and African-American and then select from each subset. 好吧,我会过滤白人和非裔美国人的数据,然后从每个子集中选择。

## 80% of the white sample
  smp_size <- floor(train_ratio * nrow(df_white))

  ## set the seed to make your partition reproductible
  set.seed(42)
  data_ind_w <- sample(seq_len(nrow(df_white)), size = smp_size)

and for the African-American 对于非裔美国人

## 20% of the african sample
  smp_size <- floor(train_ratio * nrow(df_african))

  ## set the seed to make your partition reproductible
  set.seed(42)
  data_ind_a <- sample(seq_len(nrow(df_african)), size = smp_size)

thats the new data 多数民众赞成在新的数据

  new_data <- c(white[data_ind_w,],african[data_ind_a,])

If your original dataset had 100 rows (50 white and 50 African-American) then 80% would be 40 samples, and 20% would be 10 samples. 如果原始数据集有100行(50个白人和50个非裔美国人),则80%将是40个样本,而20%将是10个样本。 Knowing these values, you can try: stratified(mydf, "group", size = c("White" = 40, "African-American" = 10)) . 了解这些值后,您可以尝试: stratified(mydf, "group", size = c("White" = 40, "African-American" = 10))

Example: 例:

mydf <- data.frame(group = rep(c("White", "African-American"), each = 50), value = 1:100)
mydf
library(splitstackshape)
set.seed(1)
x <- stratified(mydf, "group", size = c("White" = 40, "African-American" = 10))
summary(x)
 #              group        value      
 # African-American:10   Min.   : 1.00  
 # White           :40   1st Qu.:15.25  
 #                       Median :31.00  
 #                       Mean   :34.88  
 #                       3rd Qu.:47.50  
 #                       Max.   :93.00 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM