简体   繁体   English

R中的聚合函数(处理NA)

[英]aggregate function in R (dealing with NA's)

sorry if this question has already been ensered but I couldn't find what i needed... 抱歉,如果这个问题已经提出,但是我找不到我需要的...

this is my hypothetical database: 这是我的假设数据库:

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=as.data.frame(cbind(x1, x2, x3, x4, x5))

I tried many different things, but this is basicaly the idea 我尝试了许多不同的方法,但这基本上是个主意

dbF=aggregate(db$x5,by=list(db$x1, db$x2, db$x3,db$x4),FUN=sum)

the expected output is this: 预期的输出是这样的:

x1e=c("A", "B", "C", "C")
x2e=c("L1", "L1", "L1", "L2")
x3e=c("a", "NA", "b", "j")                 
x4e=c(17, 13.2, NA, 3)
x5e=c(25,13,7,6)
dbExpected=as.data.frame(cbind(x1e, x2e, x3e, x4e, x5e))

I realy need to keep the NA's in the final output....any suggestions? 我真的需要将NA保留在最终输出中。...有什么建议吗? thx in advance 提前

Couple things: when you make your data.frame like that ( cbind then coerce) you are making an intermediate matrix of characters, so when you coerce to a data.frame everything is a factor (not wanted for obvious reasons since x5 should be numeric). 几件事情:当像这样创建data.frame时( cbind然后强制),您正在制作一个中间字符矩阵,因此当您强制到data.frame时,所有的事情都是一个因素(出于明显的原因而不需要,因为x5应该是数字的)。 Also, make sure that the x4 variable has a NA level (here using addNA , so when you aggregate by it, you get what you want. 另外,请确保x4变量具有NA级别(此处使用addNA ,因此,通过它进行聚合时,您将获得所需的内容。

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=addNA(factor(c(17, 17, 13.2, NA, 3, 13.2)))
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)

dbF=aggregate(x5 ~ x1+x2+x3+x4, data=db, FUN=sum, na.action=na.pass)
dbF
#  x1 x2 x3   x4 x5
# 1  C L2  j    3  6
# 2  B L1 NA 13.2 13
# 3  A L1  a   17 25
# 4  C L1  b <NA>  7

You can use dplyr and some of your functions are redundant. 您可以使用dplyr,并且某些功能是多余的。

# install.packages('dplyr') # only run if not installed
library(dplyr)

x1=c("A", "A", "B", "C", "C", "B")
x2=c("L1", "L1", "L1", "L1", "L2", "L1")
x3=c("a", "a", "NA", "b", "j","NA" )
x4=c(17, 17, 13.2, NA, 3, 13.2)
x5=c(1,24,5,7,6,8)
db=data.frame(x1, x2, x3, x4, x5)

db %>%
  group_by(x1, x2, x3, x4) %>%
  dplyr::summarise(x5e = sum(x5))

Source: local data frame [4 x 5]
Groups: x1, x2, x3 [?]

      x1     x2     x3    x4   x5e
  (fctr) (fctr) (fctr) (dbl) (dbl)
1      A     L1      a  17.0    25
2      B     L1     NA  13.2    13
3      C     L1      b    NA     7
4      C     L2      j   3.0     6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM