I am wanting to do some statistics using R on a data set that I have. The data is in a list and is grouped using an identifying code, given here in the cat column
cat AS_Year AS_Day As_Month EVENT_TYPE RESULT_TYPE REASON_TYPE OPERATOR_TYPE DATE_EVENT Day_Total
9002F100AS2 2009 14 2 9002 F 100 AS2 14-Feb-09 2
9002F123AS2 2009 14 2 9002 F 123 AS2 14-Feb-09 1
9008F0AS2 2009 14 2 9008 F 0 AS2 14-Feb-09 1
There are thousands of these codes on each day and I would like to do some statistics on the volumes for each.
I have looked into things and have tried playing around with
ddply(dtest,~group,summarise,mean=mean(Day_Total),sd=sd(Day_Total))
This gives me NA for the mean and a sd that doesn't correlate with that which I get from using excel. I have also tested this on a simpler, smaller test data set and the means and sd don't seem to be correct. Does anyone have any advice on how to use this or am I missing something somewhere
Try the very efficient data.table
package
library(data.table)
setDT(dtest)[, list(mean = mean(Day_Total, na.rm = T),
sd=sd(Day_Total, na.rm = T)), by = cat]
Or if you prefer to stick with the plyr
series, try the newer and much more efficient dplyr
package
Note : Detach plyr
first by doing detach("package:plyr", unload = TRUE)
library(dplyr)
dtest %>%
group_by(cat) %>%
summarise(mean = mean(Day_Total, na.rm = T), sd=sd(Day_Total, na.rm = T))
I assume by group you meant cat in your oneliner. Can it be that your Day_Total or cat is not the right type ? Can it be that there are some Non applicable value in the Day_Total column ?
What gives?
ddply(dtest,.(as.factor(cat)), summarise, mean=mean(Day_Total,na.rm=true),sd=sd(Day_Total,na.rm=true))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.