[英]Groupwise summary statistics for all dependent variables in R using dplyr
我正在嘗試為10個因變量中的每一個生成分組統計(聽力-我的自變量,所以HL和NH是兩個組)摘要統計信息(均值,sd,min,max,標准誤差等)。 我可以使用以下兩個代碼對一個變量(R_PTA)進行此操作:
1。
RightPTA <- mydata %>% group_by(NHL) %>% summarise(n=length(R_PTA), mean_R_PTA=mean(R_PTA), sd_R_PTA=sd(R_PTA), se_R_PTA=sd(R_PTA)/sqrt(length(R_PTA)), min_R_PTA=min(R_PTA), max_R_PTA=max(R_PTA))
2。
mydata
mean<-tapply(mydata$R_PTA, mydata$NHL, mean)
mean
sd<-tapply(mydata$R_PTA, mydata$NHL, sd)
sd
median<-tapply(mydata$R_PTA, mydata$NHL, median)
median
max<-tapply(mydata$R_PTA, mydata$NHL, max)
max
min<-tapply(mydata$R_PTA, mydata$NHL, min)
min
cbind(mean, sd, median, max, min)
round(cbind(mean, sd, median, max, min), digits = 1)
t1<-round(cbind(mean, sd, median, max, min), digits = 1)
t1
這是輸出:
RightearPTA
mean sd median max min
HL 26.9 7.3 27.5 37.5 8.8
NH 11.6 4.1 12.5 16.2 2.5
對於其余的9個變量(L_PTA,B_PTA等),我希望有完全相同的結果,但如果可能,請一次性進行。 有沒有辦法做到這一點? 我是否必須為每個因變量編碼? 我確定它在那里,但是我找不到它! 任何幫助將不勝感激!
考慮與基礎R溶液by
(面向對象的包裝器tapply
到子集數據幀到因子基團)和嵌套sapply
(構建統計的矩陣)。 下面展示了10個統計信息列的隨機種子數據:
set.seed(88)
df <- data.frame(
GROUP = sapply(seq(50), function(i) sample(c("NH", "HL"), 1, replace=TRUE)),
STAT1 = rnorm(50)*100,
STAT2 = rnorm(50),
STAT3 = runif(50)*100,
STAT4 = runif(50),
STAT5 = rgamma(50, shape = 2)*100,
STAT6 = rgamma(50, shape = 2),
STAT7 = rpois(50, lambda = 100)*100,
STAT8 = rpois(50, lambda = 100),
STAT9 = rexp(50, rate = 1)*100,
STAT10 = rexp(50, rate = 1)
)
dfList <- by(df, df$GROUP, FUN = function(d)
sapply(d[2:ncol(d)], function(i)
c(mean = mean(i, na.rm=TRUE),
sd = sd(i, na.rm=TRUE),
median = median(i, na.rm=TRUE),
min = min(i, na.rm=TRUE),
max = max(i, na.rm=TRUE)
)
)
)
輸出量
dfList$HL
# STAT1 STAT2 STAT3 STAT4 STAT5 STAT6 STAT7 STAT8 STAT9 STAT10
# mean -6.594221 -0.04059519 52.990723 0.58753311 157.55220 1.9196911 10103.4483 101.17241 113.089148 0.771495372
# sd 102.512709 0.99159105 31.055376 0.27339871 152.37034 1.4880694 709.3673 10.02165 121.360898 0.720117072
# median 8.034055 0.01163562 56.416484 0.56894472 136.58274 1.5150241 10200.0000 103.00000 77.302150 0.599291434
# min -199.786535 -1.84703449 1.345751 0.00207128 22.56936 0.1553518 8400.0000 82.00000 2.396641 0.006532798
# max 251.976970 2.55701655 98.612123 0.99413520 806.38484 7.1030277 11900.0000 120.00000 487.719745 3.133768953
dfList$NH
# STAT1 STAT2 STAT3 STAT4 STAT5 STAT6 STAT7 STAT8 STAT9 STAT10
# mean 26.51853 -0.13748799 44.1973692 0.46621478 155.7555 1.880407 9961.9048 104.38095 150.596480 1.1243476
# sd 90.57645 0.77843518 29.9227560 0.30340507 121.5361 1.105004 868.6059 8.44083 131.123059 1.1627959
# median 24.52202 -0.02949522 46.1950960 0.33646282 114.7845 1.736198 9700.0000 105.00000 122.841835 0.7819896
# min -105.54741 -1.58980314 0.2636007 0.02044767 17.3282 0.291350 8900.0000 89.00000 7.799051 0.1108107
# max 194.78958 1.35889041 96.0175463 0.99160167 434.5724 4.368176 12000.0000 120.00000 554.307036 5.1537741
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.