[英]R aggregating nested column in a dataframe and preserving original column names
I have a dataframe that looks like the following:我有一个 dataframe,如下所示:
df<-structure(list(hex = c(90, 400, 90, 400, 250, 250, 400, 90, 90,
90), material_diff = structure(c(12, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 10, 20, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L,
3L))), class = "data.frame", row.names = c(NA, -10L))
hex material_diff.1 material_diff.2 material_diff.3
1 90 12 0 0
2 400 0 0 10
3 90 0 0 20
4 400 0 0 0
5 250 0 0 0
6 250 0 0 0
7 400 0 0 0
8 90 0 0 0
9 90 0 0 0
10 90 0 9 0
I want to sum the nested column material_diff and group by hex.我想对嵌套列 material_diff 求和并按十六进制分组。 The result should look like the following:
结果应如下所示:
hex material_diff.1 material_diff.2 material_diff.3
1 90 12 9 20
2 400 0 0 10
3 250 0 0 0
I have been able to do this using the aggregate function as follows:我已经能够使用聚合 function 来做到这一点,如下所示:
aggregate(df$material_diff, by=list(df$hex),FUN=sum)
However, this returns the desired result but doesn't preserve the column names:但是,这会返回所需的结果,但不会保留列名:
Group.1 V1 V2 V3
1 90 12 9 20
2 250 0 0 10
3 400 0 0 0
How might I do this whilst still preserving the original column names?在保留原始列名的同时如何执行此操作?
Here is an idea based on the concept of split/apply/combine, ie这里有一个基于split/apply/combine概念的想法,即
do.call(rbind, lapply(split(df, df$hex), colSums))
# hex material_diff.1 material_diff.2 material_diff.3
#90 450 12 9 20
#250 500 0 0 0
#400 1200 0 0 10
This is a bit non-standard and out of my comfort-zone, but this works... not certain if there's a better way.这有点不标准,超出了我的舒适区,但这有效……不确定是否有更好的方法。
out <- do.call(rbind,
lapply(split(df, df$hex),
function(z) transform(z[1,,drop=FALSE],
material_diff = matrix(colSums(z$material_diff), nrow = 1))))
out
# hex material_diff.1 material_diff.2 material_diff.3
# 90 90 12 9 20
# 250 250 0 0 0
# 400 400 0 0 10
str(out)
# 'data.frame': 3 obs. of 2 variables:
# $ hex : num 90 250 400
# $ material_diff: num [1:3, 1:3] 12 0 0 9 0 0 20 0 10
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : NULL
A tidyverse solution:一个整洁的解决方案:
df %>%
mutate(material_diff = data.frame(material_diff)) %>%
unpack(material_diff, names_sep = '.') %>%
group_by(hex) %>%
summarize(across(everything(), ~sum(.)))
# A tibble: 3 x 4
hex material_diff.X1 material_diff.X2 material_diff.X3
<dbl> <dbl> <dbl> <dbl>
1 90 12 9 20
2 250 0 0 0
3 400 0 0 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.