如何将聚合与计数一起使用，但还要考虑 R 中的一些 NA 值？

Question

I have a dataframe - df1.我有一个 dataframe - df1。 I want to get it to df2 as shown below using R:我想使用 R 将其发送到 df2，如下所示：

**df1**
Cust_id Cust_name       Cust_order
1       Andrew          coffee
2       Dillain         burger
3       Alma            coffee
4       Wesney          chips
5       Kiko            chips
NA      NA              fries
NA      NA              milkshake
NA      NA              sandwich
NA      NA              eggs

**df2**
Cust_order  freq
coffee      2
burger      1
chips       2
fries       0
milkshake   0
sandwich    0
eggs        0

I have used the aggregate count function to achieve this but it does not give me the result that I want.我已经使用总计数 function 来实现这一点，但它并没有给我想要的结果。 I want the orders with the NA values to give "0".我希望具有 NA 值的订单给出“0”。 Any help is appreciated.任何帮助表示赞赏。 I am very new to R and I have tried it in the following ways:我对 R 很陌生，我已经通过以下方式进行了尝试：

df2 <- aggregate(df1$Cust_order, by = list(df1$Cust_order), FUN = length)

Answer 1

You can use the formula -notation for aggregate to group by Cust_order and calculate a statistic on Cust_id .您可以使用formula - 表示aggregate按Cust_order并计算Cust_id的统计数据。 In this case, you want to count the non- NA values of Cust_id , which you can do with function(x) sum(.is.na(x)) .在这种情况下，您想要计算Cust_id的非NA值，您可以使用function(x) sum(.is.na(x))来完成。 We have to explicitly tell it to keep the NA values using the na.action argument.我们必须使用na.action参数明确告诉它保留NA值。

aggregate(Cust_id ~ Cust_order, df1, FUN = function(x) sum(!is.na(x)), na.action = na.pass)

which gives这使

  Cust_order Cust_id
1     burger       1
2      chips       2
3     coffee       2
4       eggs       0
5      fries       0
6  milkshake       0
7   sandwich       0

Answer 2

library(data.table)
setDT(mydata)[, sum(!is.na(Cust_name)), by = .(Cust_order)]

Answer 3

Another option is to sum on the Cust_id column.另一种选择是对Cust_id列sum 。 In this case we are also summing the non-NA records, but without the need to set na.action .在这种情况下，我们还对非 NA 记录求和，但无需设置na.action 。

Wrap the aggregate function with setNames to set correct column names.用setNames包装aggregate function 以设置正确的列名。

setNames(
  aggregate(df1$Cust_id, by = list(df1$Cust_order), FUN = \(x) sum(!is.na(x))), 
  c("Cust_order", "freq")
)

  Cust_order freq
1     burger    1
2      chips    2
3     coffee    2
4       eggs    0
5      fries    0
6  milkshake    0
7   sandwich    0

如何将聚合与计数一起使用，但还要考虑 R 中的一些 NA 值？

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-08-23 12:55:00

解决方案2
1 2022-08-23 12:55:29

解决方案3
1 2022-08-23 13:01:08

如何将聚合与计数一起使用，但还要考虑 R 中的一些 NA 值？

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-08-23 12:55:00

解决方案2 1 2022-08-23 12:55:29

解决方案3 1 2022-08-23 13:01:08

解决方案1
1 已采纳 2022-08-23 12:55:00

解决方案2
1 2022-08-23 12:55:29

解决方案3
1 2022-08-23 13:01:08