I'm new to R and to stackoverflow so I'm sorry if the question or it's format isn't ideal...
I'm trying to get some basic statistics from a matrix by using ddply and I wanted to make a process a bit faster by using for -loop. Unfortunately this wasn't as easy as I had thought...
Strain gene1 gene2 gene3 . . .
A 2.6336700 1.42802 0.935742
A 2.0634700 2.31232 1.096320
A 2.5798600 2.75138 0.714647
B 2.6031200 1.31374 1.214920
B 2.8319400 1.30260 1.191770
B 1.9796000 1.74199 1.056490
C 2.4030300 1.20324 1.069800
.
.
.
----------
for (n in c("gene1","gene2","gene3","gene4")) {
summary <- ddply(Data, .(Strain), summarise,
mean = mean(n),
sd = sd(n),
se = sd(n) / sqrt(length(n)) )
}
In results it reads that mean = 6 and both sd and se are "NA" ... obviously not what I had in mind.
If I get rid of the for -loop and manually insert the column name ("gene1"):
summary <- ddply(Data, .(Strain), summarise,
mean = mean(gene1),
sd = sd(gene1),
se = sd(gene1) / sqrt(length(gene1)) )
Now it seems to give me the correct result. Can someone enlighten me on this matter and tell me what I'm doing wrong?
Just use colwise (mean)
, colwise(sd)
, and colwise(length)
. No need for for loop
library(plyr)
ddply(mtcars,.(cyl), colwise(mean))
cyl mpg disp hp drat wt qsec vs am gear carb
1 4 26.66364 105.1364 82.63636 4.070909 2.285727 19.13727 0.9090909 0.7272727 4.090909 1.545455
2 6 19.74286 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 0.4285714 3.857143 3.428571
3 8 15.10000 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 0.1428571 3.285714 3.500000
For you example,
ddply(df,.(Strain),colwise(mean))
I know you didn't ask for it, but here is a solution with aggregate
in base
.
# One line in base.
aggregate(Data[paste0('gene',1:3)],by=Data['Strain'],
function(x) c(mean=mean(x),sd=sd(x),se=sd(x)/sqrt(length(x))))
You can do it using ddply, but then you just have to create a work-around by first turning your command into a string, and then by evaluating the string.
all.genes <- c("gene1","gene2","gene3","gene4")
for (i in 1:length(all.genes) {
string_eval <- sprintf("summary <- ddply(Data, .(%s), summarise,
mean = mean(n),
sd = sd(n),
se = sd(n) / sqrt(length(n)))",
all.genes[i])
eval(parse(text = string_eval))
}
I just used your code, but this loop would overwrite summary every round. I had the same problem as you, so I just wanted to let you know the solution I ended up using.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.