[英]subtracting column from the sum by group in R
Here part of mydataset 这是mydataset的一部分
df=structure(list(CustomerName = structure(c(1L, 1L, 1L, 2L, 2L,
2L), .Label = c("x", "y"), class = "factor"), ItemRelation = c(11202L,
11202L, 11202L, 1L, 1L, 1L), SaleCount = c(214L, 88L, 42L, 214L,
88L, 42L), DocumentNum = c(137L, 137L, 137L, 3L, 3L, 3L), DocumentYear = c(2018L,
2018L, 2018L, 2018L, 2018L, 2018L), k = c(114.66667, 114.66667,
114.66667, 114.66667, 114.66667, 114.66667), m0 = c(31.92, 31.92,
31.92, 31.92, 31.92, 31.92), Action_Effect = c(82.74667, 82.74667,
82.74667, 82.74667, 82.74667, 82.74667)), .Names = c("CustomerName",
"ItemRelation", "SaleCount", "DocumentNum", "DocumentYear", "k",
"m0", "Action_Effect"), class = "data.frame", row.names = c(NA,
-6L))
i need for each group CustomerName+ItemRelation+DocumentNum+DocumentYear
calculate the sum for salecount and then from this sum substract Action_Effect column. 我需要为每个组
CustomerName+ItemRelation+DocumentNum+DocumentYear
计算salecount的总和,然后从该总和中减去Action_Effect列。
IE output must be IE输出必须是
df2=structure(list(CustomerName = structure(c(1L, 1L, 1L, 2L, 2L,
2L), .Label = c("x", "y"), class = "factor"), ItemRelation = c(11202L,
11202L, 11202L, 1L, 1L, 1L), SaleCount = c(214L, 88L, 42L, 214L,
88L, 42L), DocumentNum = c(137L, 137L, 137L, 3L, 3L, 3L), DocumentYear = c(2018L,
2018L, 2018L, 2018L, 2018L, 2018L), X. = c(114.66667, 114.66667,
114.66667, 114.66667, 114.66667, 114.66667), m0 = c(31.92, 31.92,
31.92, 31.92, 31.92, 31.92), Action_Effect = c(82.74667, 82.74667,
82.74667, 82.74667, 82.74667, 82.74667), sum = c(344L, 344L,
344L, 344L, 344L, 344L), output = c(261.25333, 261.25333, 261.25333,
261.25333, 261.25333, 261.25333)), .Names = c("CustomerName",
"ItemRelation", "SaleCount", "DocumentNum", "DocumentYear", "X.",
"m0", "Action_Effect", "sum", "output"), class = "data.frame", row.names = c(NA,
-6L))
the long table, so i decided show desired output via dput()
长表,所以我决定通过
dput()
显示所需的输出
How can i do it? 我该怎么做?
Your data is a bit weird, as the values are the same for both groups, but this should work: 您的数据有点奇怪,因为两组的值都相同,但这应该可以:
libary(dplyr)
df %>%
group_by(CustomerName, ItemRelation, DocumentNum, DocumentYear) %>%
mutate(test = sum(SaleCount) - Action_Effect)
# A tibble: 6 x 9
# Groups: CustomerName, ItemRelation, DocumentNum, DocumentYear [2]
CustomerName ItemRelation SaleCount DocumentNum DocumentYear k m0 Action_Effect test
<fctr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 x 11202 214 137 2018 114.6667 31.92 82.74667 261.2533
2 x 11202 88 137 2018 114.6667 31.92 82.74667 261.2533
3 x 11202 42 137 2018 114.6667 31.92 82.74667 261.2533
4 y 1 214 3 2018 114.6667 31.92 82.74667 261.2533
5 y 1 88 3 2018 114.6667 31.92 82.74667 261.2533
6 y 1 42 3 2018 114.6667 31.92 82.74667 261.2533
To add the sum, use 要加和,请使用
df %>%
group_by(CustomerName, ItemRelation, DocumentNum, DocumentYear) %>%
mutate(sum = sum(SaleCount), output = sum(SaleCount) - Action_Effect)
For completeness, adding base
and data.table
syntax: 为了完整
data.table
,添加了base
和data.table
语法:
base
: base
:
df$test <- unlist(by(df,
paste(df$CustomerName, df$ItemRelation, df$DocumentNum, df$DocumentYear),
function(x) sum(x$SaleCount) - x$Action_Effect))
df
data.table
: data.table
:
library(data.table)
setDT(df)
df[, test2:=sum(SaleCount) - Action_Effect,
by=.(CustomerName, ItemRelation, DocumentNum, DocumentYear)][]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.