简体   繁体   English

R:一次汇总几个列

[英]R: aggregate several colums at once

I am new to R and this is the first time I use stackoverflow so excuse me if I ask for something obvious or my question is not clear enough. 我是R的新手,这是我第一次使用stackoverflow,所以请问如果我要求明显的内容或我的问题不够清楚的话。

I am working with the following data set 我正在使用以下数据集

dim(storm)
[1] 883602     39


    names(storm)
   [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"
   [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"
   [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
   [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"
   [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"
   [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
   [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
   [36] "REMARKS"    "REFNUM"     "PROPTOTAL"  "CROPTOTAL"

I am interested to use EVTYPE (a factor variable) to aggregate 4 other numerical variables ( PROPTOTAL, CROPTOTAL, FATALITIES, INJURIES ) 我有兴趣使用EVTYPE (因子变量)来汇总其他4个数字变量( PROPTOTAL, CROPTOTAL, FATALITIES, INJURIES

The factor variable as 950 levels: 因子变量为950级:

length(unique(storm$EVTYPE))
[1] 950


class(storm$EVTYPE)
[1] "factor"

So I would expect an aggregated data frame with 950 observations and 5 variables when I run the following command: 因此,当我运行以下命令时,我期望具有950个观察值和5个变量的聚合数据帧:

    storm_tidy<-
aggregate(cbind(PROPTOTAL,CROPTOTAL,FATALITIES,INJURIES)~EVTYPE,FUN=sum,data=storm)

However I get only 155 rows 但是我只有155

dim(storm_tidy)
[1] 155   5

I am using the aggregate with several columns following the help page of the function (use cbind): 我正在函数的帮助页面后的几列中使用聚合(使用cbind):

Formulas, one ~ one, one ~ many, many ~ one , and many ~ many: 公式,一〜一,一〜多, 许多〜一 ,以及许多〜许多:
aggregate(weight ~ feed, data = chickwts, mean) aggregate(breaks ~ wool + tension, data = warpbreaks, mean) **aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean)** aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)

I am loosing information at some point: 我在某些时候失去了信息:

sum(storm$PROPTOTAL)
[1] 424769204805

sum(storm_tidy$PROPTOTAL)
[1] 228366211339

However, if I aggregate column by column it seems to work fine: 但是,如果我逐列汇总,则似乎可以正常工作:

storm_tidy <- aggregate(PROPTOTAL~EVTYPE,FUN = sum, data = storm)
dim(storm_tidy)
[1] 950   2





sum(storm_tidy$PROPTOTAL)
[1] 424769204805

What am I missing? 我想念什么? What am I doing wrong? 我究竟做错了什么?

Thanks. 谢谢。

This could be a case where there are missing values in some of the columns and the entire row is deleted based on the default option na.action= na.omit in the aggregate . 在某些情况下,某些列中可能缺少值,并且会基于aggregate的默认选项na.action= na.omit删除整行。 I would try with na.action=NULL 我会尝试使用na.action=NULL

aggregate(cbind(PROPTOTAL,CROPTOTAL,FATALITIES,INJURIES)~EVTYPE,
            FUN=sum, na.rm=TRUE, data=storm, na.action=NULL)

Or we can use summarise_each from dplyr after grouping by 'EVTYPE` 或者我们可以使用summarise_eachdplyr通过“EVTYPE`分组后

library(dplyr)
storm %>% 
   group_by(EVTYPE) %>% 
   summarise_each(funs(sum=sum(., na.rm=TRUE)), 
                 PROPTOTAL,CROPTOTAL,FATALITIES,INJURIES) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM