![](/img/trans.png)
[英]Similar but different to: Split unique values into separate columns for multiple columns
[英]Spread unique values (in multiple columns) to different columns and paste aggregated values
我有一个如下的数据框:
structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L,
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L,
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1",
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
我想使用 data.table 将每列中的唯一值传播到不同的列,并在每列下粘贴总和值(来自“值”列)例如:列 col1 有 2 个唯一值 A1 和 A2。 A1 的总和为 3,A2 为 7 类似地,列 col2 有 2 个唯一值 B1 和 B2。 B1 的总和是 5,B2 的总和是 5
将对列 col1、col2 和 col3 中的每一列执行此操作。
预期输出如下
structure(list(A1 = 3, A2 = 7, B1 = 5, B2 = 5, C1 = 1, C2 = 2,
C3 = 3, C4 = 4), class = "data.frame", row.names = c(NA,
-1L))
我怎样才能在 R 中实现这一点?
data.table
答案的data.table
版本是:
library(data.table)
dcast(melt(setDT(df), 'Value')[, .(Total = sum(Value)), value],
rowid(value)~value, value.var = 'Total')
# value A1 A2 B1 B2 C1 C2 C3 C4
#1: 1 3 7 5 5 1 2 3 4
可能,您不需要value
列,因此您可以通过添加[, value := NULL][]
来删除它
我对data.table
不太适应,但可以使用tidyverse
解决方案,
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with('col')) %>%
group_by(value) %>%
summarise(res = sum(Value)) %>%
pivot_wider(names_from = value, values_from = res)
这使,
# A tibble: 1 x 8 A1 A2 B1 B2 C1 C2 C3 C4 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 3 7 5 5 1 2 3 4
基本 R版本(另一个 data.table 想要的):
t(unstack(
with(reshape(df, direction="long",
varying=grep("^col", names(df), value=TRUE), sep=""),
aggregate(formula=Value~col, FUN=sum)),
form=Value~col))
A1 A2 B1 B2 C1 C2 C3 C4
res 3 7 5 5 1 2 3 4
这是另一个基本的 R 解决方案
dfout <- t(do.call(rbind,
lapply(seq_along(df)[-1],
function(k) unstack(rev(aggregate(Value~.,df[c(1,k)],sum))))))
以至于
> dfout
A1 A2 B1 B2 C1 C2 C3 C4
res 3 7 5 5 1 2 3 4
数据
df <- structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L,
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L,
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1",
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
这是另一种选择:
library(data.table)
x <- rbindlist(lapply(paste0("col", 1:3), function(b) df[, sum(Value), b]),
use.names=FALSE)
setDT(setNames(as.list(x$V1), x$col1))[]
数据:
df <- structure(list(Value = c(1, 2, 3, 4), col1 = structure(c(1L,
1L, 2L, 2L), .Label = c("A1", "A2"), class = "factor"), col2 = structure(c(1L,
2L, 2L, 1L), .Label = c("B1", "B2"), class = "factor"), col3 = structure(1:4, .Label = c("C1",
"C2", "C3", "C4"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
您也可以按如下方式解决:
library(data.table)
melt(setDT(df), "Value")[, .(TOT = sum(Value)), value][, setNames(as.list(TOT), value)]
# A1 A2 B1 B2 C1 C2 C3 C4
# 1: 3 7 5 5 1 2 3 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.