简体   繁体   中英

How to group data and do statistics using R

I am wanting to do some statistics using R on a data set that I have. The data is in a list and is grouped using an identifying code, given here in the cat column

cat         AS_Year AS_Day  As_Month    EVENT_TYPE  RESULT_TYPE REASON_TYPE OPERATOR_TYPE   DATE_EVENT  Day_Total
9002F100AS2 2009    14       2          9002        F           100         AS2             14-Feb-09   2
9002F123AS2 2009    14       2          9002        F           123         AS2             14-Feb-09   1
9008F0AS2   2009    14       2          9008        F           0           AS2             14-Feb-09   1

There are thousands of these codes on each day and I would like to do some statistics on the volumes for each.

I have looked into things and have tried playing around with

ddply(dtest,~group,summarise,mean=mean(Day_Total),sd=sd(Day_Total))

This gives me NA for the mean and a sd that doesn't correlate with that which I get from using excel. I have also tested this on a simpler, smaller test data set and the means and sd don't seem to be correct. Does anyone have any advice on how to use this or am I missing something somewhere

Try the very efficient data.table package

library(data.table) 
setDT(dtest)[, list(mean = mean(Day_Total, na.rm = T), 
                    sd=sd(Day_Total, na.rm = T)), by = cat]

Or if you prefer to stick with the plyr series, try the newer and much more efficient dplyr package

Note : Detach plyr first by doing detach("package:plyr", unload = TRUE)

library(dplyr)
dtest %>% 
  group_by(cat) %>%
  summarise(mean = mean(Day_Total, na.rm = T), sd=sd(Day_Total, na.rm = T))

I assume by group you meant cat in your oneliner. Can it be that your Day_Total or cat is not the right type ? Can it be that there are some Non applicable value in the Day_Total column ?

What gives?

ddply(dtest,.(as.factor(cat)), summarise, mean=mean(Day_Total,na.rm=true),sd=sd(Day_Total,na.rm=true))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM