[英]looping through columns in a R dataframe
I'm writing a code in R that allow me to draw histograms along with median and the quartiles, but I have problems with cycling in the dataframe columns. 我正在用R编写代码,可以绘制直方图以及中位数和四分位数,但是在数据框列中存在循环问题。
You can find in attach the header of my dataframe and the code. 您可以在附件中找到我的数据框的标题和代码。
At the end, histograms are produced but median and quartiles are not linked to the real distributions. 最后,生成直方图,但中位数和四分位数与实际分布无关。
ROI DOY_119 DOY_127 DOY_143 DOY_151 DOY_175 DOY_191 DOY_215 DOY_239 DOY_263
1 4 -11.592668 -9.457701 -12.57275 -11.073490 -8.999743 -9.132843 -9.995659 -9.511699 -9.393022
2 4 -11.518109 -10.231917 -11.96543 -10.757207 -9.558524 -8.529423 -9.562449 -9.511699 -9.578184
3 4 -9.633711 -9.488475 -12.09012 -10.357404 -8.535619 -8.294449 -9.179331 -7.652297 -6.952941
4 4 -7.752080 -9.578184 -11.30182 -11.073490 -8.992849 -6.197888 -6.556077 -5.883803 -6.324577
5 4 -12.533658 -9.347939 -12.74088 -10.506100 -8.958544 -10.486625 -10.809219 -10.550241 -9.307020
6 4 -13.036436 -8.054857 -13.45823 -9.122186 -7.654827 -10.159230 -10.423927 -11.319436 -10.736576
for (i in 2:ncol(fileIn)){
myHist <- paste(directory, (i-1), sep="")
x11(width = 50, height = 50)
medi <- ddply(fileIn, "ROI", summarise, grp.medi=median (as.numeric(as.matrix(fileIn[i]))))
q05 <- ddply(fileIn, "ROI", summarise, grp.q05=quantile(as.numeric(as.matrix(fileIn[i]))),0.05)
q25 <- ddply(fileIn, "ROI", summarise, grp.q25=quantile(as.numeric(as.matrix(fileIn[i]))),0.25)
q75 <- ddply(fileIn, "ROI", summarise, grp.q75=quantile(as.numeric(as.matrix(fileIn[i]))),0.75)
q95 <- ddply(fileIn, "ROI", summarise, grp.q95=quantile(as.numeric(as.matrix(fileIn[i]))),0.95)
plotHist <-
ggplot(fileIn) +
aes(x = as.numeric(as.matrix(fileIn[i,]))) +
# aes(x = DOY_119) +
geom_histogram(alpha = 0.5, binwidth = 0.5, color="grey", fill= "yellow") +
geom_density(color = "green", fill= "green", alpha = 0.5) +
geom_vline(data=medi, aes(xintercept=grp.medi), color="red", size = 0.7) +
geom_vline(data=q05, aes(xintercept=grp.q05), color="black", size = 0.3) +
geom_vline(data=q25, aes(xintercept=grp.q25), color="blue", size = 0.5) +
geom_vline(data=q75, aes(xintercept=grp.q75), color="blue", size = 0.5) +
geom_vline(data=q95, aes(xintercept=grp.q95), color="black", size = 0.3) +
theme(axis.text.x = element_text(colour = "black"),
axis.text.y = element_text(colour = "black")) +
facet_wrap( ~ ROI, scales = "free")
plot(plotHist)
#------------------------------------------------------------------------------------------------------
# salvataggio X11
dev.copy(jpeg, myHist, width=2000, height=1000, res=100)
dev.off()
}
Here is a start for you. 这是您的起点。 This is the kind of problem that can be solved in multiple ways, and one is not always better than the other.
这是一种可以通过多种方式解决的问题,一个问题并不总是比另一种更好。 In your code, you were doing some things quite inefficiently (like calculating the quantiles and creating the
vline
). 在您的代码中,您所做的某些工作效率很低(例如计算分位数和创建
vline
)。 Generally in ggplot, if you find yourself repeating very similar lines (like 5 calls to vline) there is a better approach. 通常在ggplot中,如果您发现自己重复非常相似的行(例如对vline的5次调用),则有更好的方法。 I've replaced with the calculation of one 'vline_data
and fed that to
geom_vline` in combination with some manual scales. 我已经替换为一个'vline_data'的计算,
and fed that to
与一些手动比例一起and fed that to
geom_vline`中。
#add second ROI for plotting/demonstration purposes
fileIn2 <- fileIn
fileIn2$ROI <- 5
fileIn <- rbind(fileIn,fileIn2)
myplots <- lapply(colnames(fileIn)[-1],function(col_of_interest){
#create summary_data for quantiles
vline_data <- ddply(fileIn,.(ROI), function(x){
myprobs=c(0.05,0.25,0.5,0.75,0.95)
res <- data.frame(prob=as.character(myprobs),value=quantile(x[,col_of_interest],probs=myprobs) )
res
})
#create plot. Note the use of aes_string here.
plotHist <-
ggplot(fileIn, aes_string(x=col_of_interest))+
geom_histogram(alpha = 0.5, binwidth = 0.5, color="grey", fill= "yellow") +
geom_density(color = "green", fill= "green", alpha = 0.5) +
geom_vline(data=vline_data, aes(xintercept=value,size=prob,color=prob)) +
scale_color_manual(values=c("black","blue","red","black","blue"),
breaks=as.character(c(0.05,0.25,0.5,0.75,0.95)))+
scale_size_manual(values=c(0.3,0.5,0.7,0.5,0.3),
breaks=as.character(c(0.05,0.25,0.5,0.75,0.95)))+
facet_wrap( ~ ROI, scales = "free")
#optional: use `ggsave` here.
#ggsave(file=paste(directory,col_of_interest,".png"),plot=plotHist)
return(plotHist)
}
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.