[英]Stratified Sampling in R studio
我必須計算按分類變量分層的連續變量的平均值。
我想通過結果計算年齡的平均值
我把結果變成了分類變量
#make outcome variable continuous into categorical
rhc2$Outcome<-cut(rhc2$t3d30, c(0,29,30))
summary(rhc2$Outcome)
我有 4631 個關於年齡和結果的觀察值
View(rhc2$Outcome)
summary(rhc2$Outcome)
(0,29] (29,30]
1333 3298
age
70.25098
78.17896
75.33197
86.07794
54.96799
43.63898
18.04199
48.42398
34.44199
68.34796
因為你沒有發布任何示例數據,這是你的第一篇文章,所以我冒昧地創建了虛假數據來解決你的問題。
library(tidyverse)
set.seed(41)
# since you did not provide data I made up Age
Age <- sample(seq(from = 0, to = 100, by = 1),
size = 4631,replace = TRUE)
# and I made up the Outcome variable
Outcome <- sample(seq(from = 0, to = 1, by = 1),
size = 4631,
replace = TRUE,
prob = c(0.3, 0.7))
# Create the data frame
df <- data.frame(Age,Outcome)
然后你可以使用 dplyr 包的組 function 然后是總結 function
# First group the data frame by the Outcome variable
# Then calculate the mean for every outcome variable
df %>% group_by(Outcome) %>% summarize(Mean = mean(Age))
結果:
# A tibble: 2 x 2
Outcome Mean
<dbl> <dbl>
1 0 48.9
2 1 49.6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.