[英]How to group data and do statistics using R
I am wanting to do some statistics using R on a data set that I have. 我想对我拥有的数据集使用R进行一些统计。 The data is in a list and is grouped using an identifying code, given here in the cat column
数据在列表中,并使用识别码进行分组,此处在cat列中提供
cat AS_Year AS_Day As_Month EVENT_TYPE RESULT_TYPE REASON_TYPE OPERATOR_TYPE DATE_EVENT Day_Total
9002F100AS2 2009 14 2 9002 F 100 AS2 14-Feb-09 2
9002F123AS2 2009 14 2 9002 F 123 AS2 14-Feb-09 1
9008F0AS2 2009 14 2 9008 F 0 AS2 14-Feb-09 1
There are thousands of these codes on each day and I would like to do some statistics on the volumes for each. 每天有成千上万的此类代码,我想对每种代码的数量进行一些统计。
I have looked into things and have tried playing around with 我研究了事物并尝试与
ddply(dtest,~group,summarise,mean=mean(Day_Total),sd=sd(Day_Total))
This gives me NA for the mean and a sd that doesn't correlate with that which I get from using excel. 这给了我NA的平均值和一个与我使用excel得到的不相关的sd。 I have also tested this on a simpler, smaller test data set and the means and sd don't seem to be correct.
我也已经在更简单,更小的测试数据集上进行了测试,而平均值和标准偏差似乎并不正确。 Does anyone have any advice on how to use this or am I missing something somewhere
是否有人对如何使用此功能有任何建议,或者我在某处缺少任何东西
Try the very efficient data.table
package 试试非常有效的
data.table
包
library(data.table)
setDT(dtest)[, list(mean = mean(Day_Total, na.rm = T),
sd=sd(Day_Total, na.rm = T)), by = cat]
Or if you prefer to stick with the plyr
series, try the newer and much more efficient dplyr
package 或者,如果您喜欢坚持使用
plyr
系列,请尝试使用更新且效率更高的dplyr
软件包
Note : Detach plyr
first by doing detach("package:plyr", unload = TRUE)
注意 :首先通过执行
detach("package:plyr", unload = TRUE)
分离plyr
library(dplyr)
dtest %>%
group_by(cat) %>%
summarise(mean = mean(Day_Total, na.rm = T), sd=sd(Day_Total, na.rm = T))
I assume by group you meant cat in your oneliner. 我以团体的形式假设你的意思是猫在你的衬里里。 Can it be that your Day_Total or cat is not the right type ?
难道您的Day_Total或cat不是正确的类型? Can it be that there are some Non applicable value in the Day_Total column ?
Day_Total列中是否有一些不适用的值?
What gives? 是什么赋予了?
ddply(dtest,.(as.factor(cat)), summarise, mean=mean(Day_Total,na.rm=true),sd=sd(Day_Total,na.rm=true))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.