简体   繁体   English

如何将聚合与计数一起使用,但还要考虑 R 中的一些 NA 值?

[英]How to use aggregate with count but also consider some of the NA values in R?

I have a dataframe - df1.我有一个 dataframe - df1。 I want to get it to df2 as shown below using R:我想使用 R 将其发送到 df2,如下所示:

**df1**
Cust_id Cust_name       Cust_order
1       Andrew          coffee
2       Dillain         burger
3       Alma            coffee
4       Wesney          chips
5       Kiko            chips
NA      NA              fries
NA      NA              milkshake
NA      NA              sandwich
NA      NA              eggs

**df2**
Cust_order  freq
coffee      2
burger      1
chips       2
fries       0
milkshake   0
sandwich    0
eggs        0

I have used the aggregate count function to achieve this but it does not give me the result that I want.我已经使用总计数 function 来实现这一点,但它并没有给我想要的结果。 I want the orders with the NA values to give "0".我希望具有 NA 值的订单给出“0”。 Any help is appreciated.任何帮助表示赞赏。 I am very new to R and I have tried it in the following ways:我对 R 很陌生,我已经通过以下方式进行了尝试:

df2 <- aggregate(df1$Cust_order, by = list(df1$Cust_order), FUN = length)

You can use the formula -notation for aggregate to group by Cust_order and calculate a statistic on Cust_id .您可以使用formula - 表示aggregateCust_order并计算Cust_id的统计数据。 In this case, you want to count the non- NA values of Cust_id , which you can do with function(x) sum(.is.na(x)) .在这种情况下,您想要计算Cust_id的非NA值,您可以使用function(x) sum(.is.na(x))来完成。 We have to explicitly tell it to keep the NA values using the na.action argument.我们必须使用na.action参数明确告诉它保留NA值。

aggregate(Cust_id ~ Cust_order, df1, FUN = function(x) sum(!is.na(x)), na.action = na.pass)

which gives这使

  Cust_order Cust_id
1     burger       1
2      chips       2
3     coffee       2
4       eggs       0
5      fries       0
6  milkshake       0
7   sandwich       0
library(data.table)
setDT(mydata)[, sum(!is.na(Cust_name)), by = .(Cust_order)]

Another option is to sum on the Cust_id column.另一种选择是对Cust_idsum In this case we are also summing the non-NA records, but without the need to set na.action .在这种情况下,我们还对非 NA 记录求和,但无需设置na.action

Wrap the aggregate function with setNames to set correct column names.setNames包装aggregate function 以设置正确的列名。

setNames(
  aggregate(df1$Cust_id, by = list(df1$Cust_order), FUN = \(x) sum(!is.na(x))), 
  c("Cust_order", "freq")
)

  Cust_order freq
1     burger    1
2      chips    2
3     coffee    2
4       eggs    0
5      fries    0
6  milkshake    0
7   sandwich    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM