如何在R数据框中使用两个不同的函数聚合两个不同的列

Question

I have a data frame that has some records that are duplicated and I need to aggregate the duplicates so there is a unique record per row.我有一个数据框，其中有一些重复的记录，我需要聚合重复项，以便每行有一个唯一的记录。

An example:一个例子：

Col1    Col2    Col3    Col4
A       0.170   83     0.878
B       0.939   103    0.869
C       0.228   80     0.935
D       0.566   169    0.851
D       0.566   137    0.588
E       0.703   103    0.636

I need to weight the average of Col4 with Col3, and sum Col3.我需要用 Col3 加权 Col4 的平均值，并对 Col3 求和。 So my result would be:所以我的结果是：

Col1    Col2    Col3    Col4
A      0.17     83     0.878
B      0.939    103    0.869
C      0.228    80     0.935
D      0.566    306    0.733
E      0.703    103    0.636

Usually I would use the aggregate function but I can't seem to find a solution to include two different function types.通常我会使用聚合函数，但我似乎无法找到包含两种不同函数类型的解决方案。 Is there another way I can accomplish this?有没有另一种方法可以做到这一点？ I am effectively ignoring Col 2 since the granularity before merging with the data that brought in Col3 and Col4 was one record per row, and now it is being duplicated.我实际上忽略了 Col 2，因为在与引入 Col3 和 Col4 的数据合并之前的粒度是每行一条记录，现在它正在被复制。

Thank you!!谢谢！！

Answer 1

Using dplyr , you can use group_by to keep all unique rows of "Col1" and then pass all your different function into summarise .使用dplyr ，您可以使用group_by保持“Col1中”的所有独特的行，然后通过所有的不同的功能分为summarise 。 With your example, it can be something like that.以你的例子，它可以是这样的。

NB: To calculate weighted.mean of Col4 by Col3, you need to pass this function before calculating the sum of Col3, otherwise length of Col4 and Col3 will differ.注：计算weighted.mean通过COL3 COL4的，你需要计算前通过此功能sum COL3，否则长度COL4和COL3的会有所不同。

You can then reorganize your dataframe in the correct order using select :然后，您可以使用select以正确的顺序重新组织数据框：

library(dplyr)
df %>% group_by(Col1) %>%
  summarise(Col2 = mean(Col2),
            Col4 = weighted.mean(Col4,Col3),
            Col3 = sum(Col3)) %>%
  select(Col1,Col2,Col3,Col4)

# A tibble: 5 x 4
  Col1   Col2  Col3  Col4
  <chr> <dbl> <int> <dbl>
1 A     0.17     83 0.878
2 B     0.939   103 0.869
3 C     0.228    80 0.935
4 D     0.566   306 0.733
5 E     0.703   103 0.636

Data数据

structure(list(Col1 = c("A", "B", "C", "D", "D", "E"), Col2 = c(0.17, 
0.939, 0.228, 0.566, 0.566, 0.703), Col3 = c(83L, 103L, 80L, 
169L, 137L, 103L), Col4 = c(0.878, 0.869, 0.935, 0.851, 0.588, 
0.636)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x561706072cc0>)

Answer 2

Base R solution:基础 R 解决方案：

aggregated_df <- data.frame(do.call("rbind", lapply(split(df, df$Col1), function(x){
        list(Col1 = unique(x$Col1), Col2 = mean(x$Col2), Col3 = sum(x$Col3), 
                   Col4 = weighted.mean(x$Col4, x$Col3))
      }
    )
  ),
stringsAsFactors = FALSE)

Data:数据：

df <-
  structure(
    list(
      Col1 = c("A", "B", "C", "D", "D", "E"),
      Col2 = c(0.17,
               0.939, 0.228, 0.566, 0.566, 0.703),
      Col3 = c(83L, 103L, 80L,
               169L, 137L, 103L),
      Col4 = c(0.878, 0.869, 0.935, 0.851, 0.588,
               0.636)
    ),
    row.names = c(NA,-6L),
    class = c("data.frame"
    ))

如何在R数据框中使用两个不同的函数聚合两个不同的列

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-02-02 02:57:32

解决方案2
2 2020-02-02 05:32:57

如何在R数据框中使用两个不同的函数聚合两个不同的列

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-02-02 02:57:32

解决方案2 2 2020-02-02 05:32:57

解决方案1
3 已采纳 2020-02-02 02:57:32

解决方案2
2 2020-02-02 05:32:57