简体   繁体   English

R 在 dataframe 中聚合嵌套列并保留原始列名

[英]R aggregating nested column in a dataframe and preserving original column names

I have a dataframe that looks like the following:我有一个 dataframe,如下所示:

df<-structure(list(hex = c(90, 400, 90, 400, 250, 250, 400, 90, 90, 
90), material_diff = structure(c(12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 10, 20, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L, 
3L))), class = "data.frame", row.names = c(NA, -10L))

   hex material_diff.1 material_diff.2 material_diff.3
1   90              12               0               0
2  400               0               0              10
3   90               0               0              20
4  400               0               0               0
5  250               0               0               0
6  250               0               0               0
7  400               0               0               0
8   90               0               0               0
9   90               0               0               0
10  90               0               9               0

I want to sum the nested column material_diff and group by hex.我想对嵌套列 material_diff 求和并按十六进制分组。 The result should look like the following:结果应如下所示:

   hex material_diff.1 material_diff.2 material_diff.3
1   90              12               9              20
2  400               0               0              10
3  250               0               0               0

I have been able to do this using the aggregate function as follows:我已经能够使用聚合 function 来做到这一点,如下所示:

aggregate(df$material_diff, by=list(df$hex),FUN=sum)

However, this returns the desired result but doesn't preserve the column names:但是,这会返回所需的结果,但不会保留列名:

  Group.1 V1 V2 V3
1      90  12 9 20
2     250  0  0 10
3     400  0  0  0

How might I do this whilst still preserving the original column names?在保留原始列名的同时如何执行此操作?

Here is an idea based on the concept of split/apply/combine, ie这里有一个基于split/apply/combine概念的想法,即

do.call(rbind, lapply(split(df, df$hex), colSums))

#     hex material_diff.1 material_diff.2 material_diff.3
#90   450              12               9              20
#250  500               0               0               0
#400 1200               0               0              10

This is a bit non-standard and out of my comfort-zone, but this works... not certain if there's a better way.这有点不标准,超出了我的舒适区,但这有效……不确定是否有更好的方法。

out <- do.call(rbind,
  lapply(split(df, df$hex),
    function(z) transform(z[1,,drop=FALSE], 
      material_diff = matrix(colSums(z$material_diff), nrow = 1))))
out
#     hex material_diff.1 material_diff.2 material_diff.3
# 90   90              12               9              20
# 250 250               0               0               0
# 400 400               0               0              10
str(out)
# 'data.frame': 3 obs. of  2 variables:
#  $ hex          : num  90 250 400
#  $ material_diff: num [1:3, 1:3] 12 0 0 9 0 0 20 0 10
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : NULL
#   .. ..$ : NULL

A tidyverse solution:一个整洁的解决方案:

df %>%
  mutate(material_diff = data.frame(material_diff)) %>%
  unpack(material_diff, names_sep = '.') %>%
  group_by(hex) %>%
  summarize(across(everything(), ~sum(.)))

# A tibble: 3 x 4
    hex material_diff.X1 material_diff.X2 material_diff.X3
  <dbl>            <dbl>            <dbl>            <dbl>
1    90               12                9               20
2   250                0                0                0
3   400                0                0               10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM