[英]data.table R sum delimited rows by group
I currently have the following data.table:我目前有以下 data.table:
network lead_to_funded_months denominator
1: fb 0 5
2: fb 1 4
3: fb 2 4
4: fb 3 3
5: fb 4 3
6: fb 5 3
7: fb 6 5
8: fb 7 8
9: fb 8 8
10: fb 9 7
11: fb 10 5
12: fb 11 4
13: fb 12 5
14: fb 13 8
and I would like to sum for each lead_to_funded_months
all the following rows except for the current lead_to_funded_months
row.我想为每个lead_to_funded_months
总结除当前lead_to_funded_months
行之外的所有以下行。 So the result would be something like the following:所以结果将如下所示:
network lead_to_funded_months sum(denominator)
1: fb 0 67
2: fb 1 63
3: fb 2 59
4: fb 3 56
5: fb 4 53
6: fb 5 50
7: fb 6 45
8: fb 7 37
9: fb 8 29
10: fb 9 22
11: fb 10 17
12: fb 11 13
13: fb 12 8
14: fb 13 8
I have tried the following code but it's just returning the same row value:我尝试了以下代码,但它只是返回相同的行值:
dt[
between(lead_to_funded_months, min(lead_to_funded_months + 1 ,13), 13) ,
.(sum_conversion_curve = sum(denominator)),
.(lead_to_funded_months, network)
]
if someone could point my error and a way to solve it I will appreciate it.如果有人能指出我的错误和解决方法,我将不胜感激。
A data.table
option data.table
选项
dat[, s := sum(denominator) - cumsum(denominator)]
gives给
network lead_to_funded_months denominator s
1: fb 0 5 67
2: fb 1 4 63
3: fb 2 4 59
4: fb 3 3 56
5: fb 4 3 53
6: fb 5 3 50
7: fb 6 5 45
8: fb 7 8 37
9: fb 8 8 29
10: fb 9 7 22
11: fb 10 5 17
12: fb 11 4 13
13: fb 12 5 8
14: fb 13 8 0
We can use revcumsum
from spatstat.utils
我们可以使用来自revcumsum
的spatstat.utils
library(sptastat.utils)
library(data.table)
dt[, s := revcumsum(shift(denominator, type = 'lead', fill = 0))]
-output -输出
dt
network lead_to_funded_months denominator s
1: fb 0 5 67
2: fb 1 4 63
3: fb 2 4 59
4: fb 3 3 56
5: fb 4 3 53
6: fb 5 3 50
7: fb 6 5 45
8: fb 7 8 37
9: fb 8 8 29
10: fb 9 7 22
11: fb 10 5 17
12: fb 11 4 13
13: fb 12 5 8
14: fb 13 8 0
library(data.table)
dat[, s := c(rev(cumsum(rev(denominator[-1]))), 0)]
dat
# network lead_to_funded_months denominator s
# <char> <int> <int> <num>
# 1: fb 0 5 67
# 2: fb 1 4 63
# 3: fb 2 4 59
# 4: fb 3 3 56
# 5: fb 4 3 53
# 6: fb 5 3 50
# 7: fb 6 5 45
# 8: fb 7 8 37
# 9: fb 8 8 29
# 10: fb 9 7 22
# 11: fb 10 5 17
# 12: fb 11 4 13
# 13: fb 12 5 8
# 14: fb 13 8 0
I'm assuming that your row 14 sum of 8 is a mistake, since there are no rows past it to sum up;我假设您的第 14 行总和 8 是一个错误,因为没有行可以总结; it should either be 0
or NA
.它应该是0
或NA
。 If you really want it to be 8
, though, just change to但是,如果您真的希望它为8
,只需更改为
dat[, s2 := c(rev(cumsum(rev(denominator[-1]))), denominator[.N])]
Data数据
dat <- setDT(structure(list(network = c("fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb"), lead_to_funded_months = 0:13, denominator = c(5L, 4L, 4L, 3L, 3L, 3L, 5L, 8L, 8L, 7L, 5L, 4L, 5L, 8L)), class = c("data.table", "data.frame"), row.names = c(NA, -14L)))
Create example dataset创建示例数据集
df <- data.frame(
lead = 0:13,
denom = c(5, 4, 4, 3, 3, 3, 5, 8, 8, 7, 5, 4, 5, 8)
)
Calculate:计算:
# Reverse sort by `lead`
df <- df[order(df$lead, decreasing = T), ]
# Do the cumulative sum
df$sum_denom <- cumsum(df$denom) - df$denom
# Resort by `lead`
df <- df[order(df$lead), ]
Result:结果:
#> lead denom sum_denom
#> 1 0 5 67
#> 2 1 4 63
#> 3 2 4 59
#> 4 3 3 56
#> 5 4 3 53
#> 6 5 3 50
#> 7 6 5 45
#> 8 7 8 37
#> 9 8 8 29
#> 10 9 7 22
#> 11 10 5 17
#> 12 11 4 13
#> 13 12 5 8
#> 14 13 8 0 # <-- note the 0, not an 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.