简体   繁体   English

data.table R 按组求和分隔的行

[英]data.table R sum delimited rows by group

I currently have the following data.table:我目前有以下 data.table:

   network lead_to_funded_months denominator
 1:      fb                     0   5
 2:      fb                     1   4
 3:      fb                     2   4
 4:      fb                     3   3
 5:      fb                     4   3
 6:      fb                     5   3
 7:      fb                     6   5
 8:      fb                     7   8
 9:      fb                     8   8
10:      fb                     9   7
11:      fb                    10   5
12:      fb                    11   4
13:      fb                    12   5
14:      fb                    13   8

and I would like to sum for each lead_to_funded_months all the following rows except for the current lead_to_funded_months row.我想为每个lead_to_funded_months总结除当前lead_to_funded_months行之外的所有以下行。 So the result would be something like the following:所以结果将如下所示:

  network lead_to_funded_months     sum(denominator)
 1:      fb                     0   67
 2:      fb                     1   63
 3:      fb                     2   59
 4:      fb                     3   56
 5:      fb                     4   53
 6:      fb                     5   50
 7:      fb                     6   45
 8:      fb                     7   37
 9:      fb                     8   29
10:      fb                     9   22
11:      fb                    10   17
12:      fb                    11   13
13:      fb                    12   8
14:      fb                    13   8

I have tried the following code but it's just returning the same row value:我尝试了以下代码,但它只是返回相同的行值:

dt[
     between(lead_to_funded_months, min(lead_to_funded_months + 1 ,13), 13) ,
     .(sum_conversion_curve = sum(denominator)),
     .(lead_to_funded_months, network)
 ]

if someone could point my error and a way to solve it I will appreciate it.如果有人能指出我的错误和解决方法,我将不胜感激。

A data.table option data.table选项

dat[, s := sum(denominator) - cumsum(denominator)]

gives

    network lead_to_funded_months denominator  s
 1:      fb                     0           5 67
 2:      fb                     1           4 63
 3:      fb                     2           4 59
 4:      fb                     3           3 56
 5:      fb                     4           3 53
 6:      fb                     5           3 50
 7:      fb                     6           5 45
 8:      fb                     7           8 37
 9:      fb                     8           8 29
10:      fb                     9           7 22
11:      fb                    10           5 17
12:      fb                    11           4 13
13:      fb                    12           5  8
14:      fb                    13           8  0

We can use revcumsum from spatstat.utils我们可以使用来自revcumsumspatstat.utils

library(sptastat.utils)
library(data.table)
dt[, s := revcumsum(shift(denominator, type = 'lead', fill = 0))]

-output -输出

 dt
    network lead_to_funded_months denominator  s
 1:      fb                     0           5 67
 2:      fb                     1           4 63
 3:      fb                     2           4 59
 4:      fb                     3           3 56
 5:      fb                     4           3 53
 6:      fb                     5           3 50
 7:      fb                     6           5 45
 8:      fb                     7           8 37
 9:      fb                     8           8 29
10:      fb                     9           7 22
11:      fb                    10           5 17
12:      fb                    11           4 13
13:      fb                    12           5  8
14:      fb                    13           8  0
library(data.table)
dat[, s := c(rev(cumsum(rev(denominator[-1]))), 0)]
dat
#     network lead_to_funded_months denominator     s
#      <char>                 <int>       <int> <num>
#  1:      fb                     0           5    67
#  2:      fb                     1           4    63
#  3:      fb                     2           4    59
#  4:      fb                     3           3    56
#  5:      fb                     4           3    53
#  6:      fb                     5           3    50
#  7:      fb                     6           5    45
#  8:      fb                     7           8    37
#  9:      fb                     8           8    29
# 10:      fb                     9           7    22
# 11:      fb                    10           5    17
# 12:      fb                    11           4    13
# 13:      fb                    12           5     8
# 14:      fb                    13           8     0

I'm assuming that your row 14 sum of 8 is a mistake, since there are no rows past it to sum up;我假设您的第 14 行总和 8 是一个错误,因为没有行可以总结; it should either be 0 or NA .它应该是0NA If you really want it to be 8 , though, just change to但是,如果您真的希望它为8 ,只需更改为

dat[, s2 := c(rev(cumsum(rev(denominator[-1]))), denominator[.N])]

Data数据

dat <- setDT(structure(list(network = c("fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb", "fb"), lead_to_funded_months = 0:13, denominator = c(5L, 4L, 4L, 3L, 3L, 3L, 5L, 8L, 8L, 7L, 5L, 4L, 5L, 8L)), class = c("data.table", "data.frame"), row.names = c(NA, -14L)))

Create example dataset创建示例数据集

df <- data.frame(
    lead = 0:13,
    denom = c(5, 4, 4, 3, 3, 3, 5, 8, 8, 7, 5, 4, 5, 8)
)

Calculate:计算:

# Reverse sort by `lead`
df <- df[order(df$lead, decreasing = T), ]

# Do the cumulative sum
df$sum_denom <- cumsum(df$denom) - df$denom

# Resort by `lead`
df <- df[order(df$lead), ]

Result:结果:

#>    lead denom sum_denom
#> 1     0     5        67
#> 2     1     4        63
#> 3     2     4        59
#> 4     3     3        56
#> 5     4     3        53
#> 6     5     3        50
#> 7     6     5        45
#> 8     7     8        37
#> 9     8     8        29
#> 10    9     7        22
#> 11   10     5        17
#> 12   11     4        13
#> 13   12     5         8
#> 14   13     8         0   # <-- note the 0, not an 8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM