![](/img/trans.png)
[英]How to get the results of a mean and standard deviation to the same data frame by creating extra columns (mean and standard deviation) in R
[英]How to plot a single datapoint with mean and standard deviation from a data frame in R
我在R中有以下格式的大型數據框:
"SubjID" "HR" "IBI" "Stimulus" "Status"
"S1" 75.98 790 1 1
"S1" 75.95 791 1 2
"S1" 65.7 918 1 3
"S1" 59.63 100 1 4
"S1" 59.44 101 1 5
"S1" 59.62 101 2 1
"S1" 63.85 943 2 2
"S1" 60.75 992 2 3
"S1" 59.62 101 2 4
"S1" 61.68 974 2 5
"S2" 65.21 921 1 1
"S2" 59.23 101 1 2
"S2" 61.23 979 1 3
"S2" 70.8 849 1 4
"S2" 74.21 809 1 4
我想為狀態列的每個值繪制“ HR”列的平均值。
我編寫了以下R代碼,在其中創建數據的子集(通過“ Status”的不同值)並進行繪制:
numberOfSeconds <- 8;
for(stimNumber in 1:40) {
stimulus2plot <- subset(resampledDataFile, Stimulus == stimNumber & Status <= numberOfSeconds, select=c(SubjID, HR, IBI, Stimulus, Status))
plot(stimulus2plot$HR~stimulus2plot$Status, xlab="",ylab="")
lines(stimulus2plot$HR~stimulus2plot$Status, xlab="",ylab="")
}
從而獲得類似於以下內容的圖:
每個“刺激”都有一個劇情。 在每個圖的X軸上,我都有“狀態”列,在YI上,每個“ SubjID”都有一個“ HR”值。 快好了...
但是,我最終想要獲得的是每個X值只有一個Y數據點。 也就是說,Y應該是平均值(HR列的平均值),類似於下圖:
如何做到這一點? 還要在每個數據點中將標准偏差顯示為誤差線,這將是非常棒的。
在此先感謝您的幫助。
您最容易做的是先預先計算值,然后繪制它們。 我將使用ddply
進行這種分析:
library(plyr)
res = ddply(df, .(Status), summarise, mn = mean(HR))
並使用ggplot2進行繪制:
ggplot(res, aes(x = Status, y = mn)) + geom_line() + geom_point()
最簡單的方法是tapply()
。 如果您的data.frame
是data
:
means <- with(data, tapply(HR, Status, mean))
plot(means, type="l")
同樣容易計算和繪制誤差線:
serr <- with(data, tapply(HR, Status, function(x)sd(x)/sqrt(length(x))))
plot(means, type="o", ylim=c(50,80))
sapply(1:length(serr), function(i) lines(rep(i,2), c(means[i]+serr[i], means[i]-serr[i])))
為了使它最接近您想要的:
library(ggplot2)
library(plyr)
df.summary <- ddply(df, .(Stimulus, Status), summarise,
HR.mean = mean(HR),
HR.sd = sd(HR))
ggplot(df.summary, aes(Status, HR.mean)) + geom_path() + geom_point() +
geom_errorbar(aes(ymin=HR.mean-HR.sd, ymax=HR.mean+HR.sd), width=0.25) +facet_wrap(~Stimulus)
您可以按照以下偽數據示例作為指導,完全在ggplot2中完成此操作:
DF <- data.frame(stimulus = factor(rep(paste("Stimulus", seq(4)), each = 40)),
subject = factor(rep(seq(20), each = 8)),
time = rep(seq(8), 20),
resp = rnorm(160, 50, 10))
# spaghetti plots
ggplot(DF, aes(x = time, y = resp, group = subject)) +
geom_line() +
facet_wrap(~ stimulus, ncol = 1)
# plot of time averages by stimulus
ggplot(DF, aes(x = time, y = resp)) +
stat_summary(fun.y = mean, geom = "line", group = 1) +
stat_summary(fun.y = mean, geom = "point", group = 1, shape = 1) +
facet_wrap(~ stimulus, ncol = 1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.