简体   繁体   English

R工作室分层抽样

[英]Stratified Sampling in R studio

I have to Calculate the mean of a continuous variable, stratified by a categorical variable.我必须计算按分类变量分层的连续变量的平均值。
I want to calculate the mean of age by outcome我想通过结果计算年龄的平均值

I turned Outcome into categorical variable我把结果变成了分类变量

#make outcome variable continuous into categorical
rhc2$Outcome<-cut(rhc2$t3d30, c(0,29,30))

summary(rhc2$Outcome)

I have 4631 observations for age and outcome我有 4631 个关于年龄和结果的观察值

View(rhc2$Outcome)   
summary(rhc2$Outcome)   
 (0,29] (29,30]     
   1333    3298 

age
70.25098
78.17896
75.33197
86.07794
54.96799
43.63898
18.04199
48.42398
34.44199
68.34796

since you didn't post any example data and this is your first post I took the liberty to create fake data to adress your question.因为你没有发布任何示例数据,这是你的第一篇文章,所以我冒昧地创建了虚假数据来解决你的问题。

library(tidyverse)

set.seed(41)
# since you did not provide data I made up Age
Age <- sample(seq(from = 0, to = 100, by = 1),
                    size =  4631,replace = TRUE)

# and I made up the Outcome variable
Outcome <- sample(seq(from = 0, to = 1, by = 1),
                   size = 4631,
                   replace = TRUE,
                   prob = c(0.3, 0.7))

# Create the data frame
df <- data.frame(Age,Outcome)

Then you can utilize the dplyr package's group function followed by the summarize function然后你可以使用 dplyr 包的组 function 然后是总结 function

# First group the data frame by the Outcome variable
# Then calculate the mean for every outcome variable

df %>% group_by(Outcome) %>% summarize(Mean = mean(Age))

This results:结果:

# A tibble: 2 x 2
  Outcome  Mean
    <dbl> <dbl>
1       0  48.9
2       1  49.6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM