繁体   English   中英

将唯一值(在多列中)传播到不同的列并粘贴聚合值

[英]Spread unique values (in multiple columns) to different columns and paste aggregated values

我有一个如下的数据框:

structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1", 
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

我想使用 data.table 将每列中的唯一值传播到不同的列,并在每列下粘贴总和值(来自“值”列)例如:列 col1 有 2 个唯一值 A1 和 A2。 A1 的总和为 3,A2 为 7 类似地,列 col2 有 2 个唯一值 B1 和 B2。 B1 的总和是 5,B2 的总和是 5

将对列 col1、col2 和 col3 中的每一列执行此操作。

预期输出如下

structure(list(A1 = 3, A2 = 7, B1 = 5, B2 = 5, C1 = 1, C2 = 2, 
    C3 = 3, C4 = 4), class = "data.frame", row.names = c(NA, 
-1L))

我怎样才能在 R 中实现这一点?

data.table答案的data.table版本是:

library(data.table)

dcast(melt(setDT(df), 'Value')[, .(Total = sum(Value)), value],
           rowid(value)~value, value.var = 'Total')

#   value A1 A2 B1 B2 C1 C2 C3 C4
#1:     1  3  7  5  5  1  2  3  4

可能,您不需要value列,因此您可以通过添加[, value := NULL][]来删除它

我对data.table不太适应,但可以使用tidyverse解决方案,

library(dplyr)
library(tidyr)

df %>% 
 pivot_longer(starts_with('col')) %>% 
 group_by(value) %>% 
 summarise(res = sum(Value)) %>% 
 pivot_wider(names_from = value, values_from = res)

这使,

 # A tibble: 1 x 8 A1 A2 B1 B2 C1 C2 C3 C4 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 3 7 5 5 1 2 3 4

基本 R版本(另一个 data.table 想要的):

t(unstack(
    with(reshape(df, direction="long", 
             varying=grep("^col", names(df), value=TRUE), sep=""),
     aggregate(formula=Value~col, FUN=sum)), 
  form=Value~col))

    A1 A2 B1 B2 C1 C2 C3 C4
res  3  7  5  5  1  2  3  4

这是另一个基本的 R 解决方案

dfout <- t(do.call(rbind,
                   lapply(seq_along(df)[-1], 
                          function(k) unstack(rev(aggregate(Value~.,df[c(1,k)],sum))))))

以至于

> dfout
    A1 A2 B1 B2 C1 C2 C3 C4
res  3  7  5  5  1  2  3  4

数据

df <- structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1", 
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

这是另一种选择:

library(data.table)
x <- rbindlist(lapply(paste0("col", 1:3), function(b) df[, sum(Value), b]), 
    use.names=FALSE)

setDT(setNames(as.list(x$V1), x$col1))[]

数据:

df <- structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L, 
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L, 
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1", 
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

您也可以按如下方式解决:

library(data.table)
melt(setDT(df), "Value")[, .(TOT = sum(Value)), value][, setNames(as.list(TOT), value)]

#       A1    A2    B1    B2    C1    C2    C3    C4
# 1:     3     7     5     5     1     2     3     4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM