R data.table条件求和

Question

> tempDT <- data.table(colA = c("E","E","A","C","E","C","E","C","E"), colB = c(20,30,40,30,30,40,30,20,10), group = c(1,1,1,1,2,2,2,2,2), want = c(NA, 30, 40, 70,NA,40,70,20,30))
> tempDT
   colA colB group want
1:    E   20     1   NA
2:    E   30     1   30
3:    A   40     1   40
4:    C   30     1   70
5:    E   30     2   NA
6:    C   40     2   40
7:    E   30     2   70
8:    C   20     2   20
9:    E   10     2   30

I have columns 'colA' 'colB' 'group': within each 'group', I would like to sum up 'colB' from bottom up until 'colA' is 'E'. 我有“ colA”，“ colB”，“ group”列：在每个“ group”中，我想从下至上总结“ colB”，直到“ colA”为“ E”。

Answer 1

Based on the expected 'want', we create a run-length-id column 'grp' by checking if the value is 'E' in 'colA', then create 'want1' as the cumulative sum of 'colB' after grouping by 'grp' and 'group', get the row index ('i1') of elements that are duplicated in 'colA' and also is 'E' and assign the 'colB' values to 'want1' 基于预期的“ want”，我们通过检查“ colA”中的值是否为“ E”来创建游程长度ID列“ grp”，然后在按以下项分组后创建“ want1”作为“ colB”的累积和'grp'和'group'，获取在'colA'中duplicated并且也是'E'的元素的行索引（'i1'），并将'colB'值分配给'want1'

tempDT[, grp:= rleid(colA=="E") * (colA != "E")
        ][grp!= 0, want1 := cumsum(colB), .(grp, group)]
i1 <- tempDT[, .I[colA=="E" & duplicated(colA)], group]$V1
tempDT[i1, want1 := colB][, grp := NULL][]
#    colA colB group want want1
#1:    E   20     1   NA    NA
#2:    E   30     1   30    30
#3:    A   40     1   40    40
#4:    C   30     1   70    70
#5:    E   30     2   NA    NA
#6:    C   30     2   30    30

Answer 2

Hope this helps! 希望这可以帮助！

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(row_num = n():1) %>%
  group_by(group) %>%
  mutate(sum_colB = sum(colB[row_num < row_num[which(colA=='E')]]),
         flag = ifelse(row_num >= row_num[which(colA=='E')], 0, 1),) %>%
  mutate(sum_colB = ifelse(flag==1 & row_num==1, sum_colB, ifelse(flag==0, NA, colB))) %>%
  select(-flag, -row_num) %>%
  data.frame()

Output is: 输出为：

  colA colB group want sum_colB
1    E   20     1   NA       NA
2    E   30     1   30       NA
3    A   40     1   40       40
4    C   30     1   70       70
5    E   30     2   NA       NA
6    C   30     2   30       30

Sample data: 样本数据：

df <- structure(list(colA = structure(c(3L, 3L, 1L, 2L, 3L, 2L), .Label = c("A", 
"C", "E"), class = "factor"), colB = c(20, 30, 40, 30, 30, 30
), group = c(1, 1, 1, 1, 2, 2), want = c(NA, 30, 40, 70, NA, 
30)), .Names = c("colA", "colB", "group", "want"), row.names = c(NA, 
-6L), class = "data.frame")

Answer 3

There's one approach: row reference + sums 有一种方法：行引用+总和

# input data
tempDT <- data.table(colA = c("E","E","A","C","E","C","E","C","E"), colB = c(20,30,40,30,30,40,30,20,10), group = c(1,1,1,1,2,2,2,2,2), want = c(NA, 30, 40, 70,NA,40,70,20,30))
tempDT

# find row reference previous row where colA is "E"
lastEpos <- function(i) tail(which(tempDT$colA[1:(i-1)] == "E"), 1)
tempDT[, rowRef := sapply(.I, lastEpos), by = "group"]

# sum up
sumEpos <- function(i) {
  valTEMP <- tempDT$rowRef[i]
  outputTEMP <- sum(tempDT$colB[(valTEMP+1):i])  # sum
  return(outputTEMP)
}
tempDT[, want1 := sapply(.I, sumEpos), by = "group"]

# deal with first row in every group
tempDT[, want1 := c(NA, want1[-1]), by = "group"]

# clean output
tempDT[, rowRef := NULL]
tempDT

R data.table条件求和

问题描述

3 个解决方案

解决方案1
1 2018-03-06 06:43:56

解决方案2
0 2018-03-06 07:19:53

解决方案3
0 2018-03-06 22:45:00

R data.table条件求和

问题描述

3 个解决方案

解决方案1 1 2018-03-06 06:43:56

解决方案2 0 2018-03-06 07:19:53

解决方案3 0 2018-03-06 22:45:00

解决方案1
1 2018-03-06 06:43:56

解决方案2
0 2018-03-06 07:19:53

解决方案3
0 2018-03-06 22:45:00