简体   繁体   English

如何通过唯一值累加和变量并输入回

[英]how to cumulative sum variable by unique values and input back in

I'm looking to do the following -- cumulative sum the indicator values and remove the indicators after those days original:我希望执行以下操作——指标值的累计总和,并在原始日期之后删除指标:

transaction交易 day indicator指标
1 1个 1 1个 0 0
1 1个 2 2个 0 0
1 1个 3 3个 0 0
1 1个 4 4个 1 1个
1 1个 5 5个 1 1个
1 1个 6 6个 1 1个
2 2个 1 1个 0 0
2 2个 2 2个 0 0
2 2个 3 3个 0 0
2 2个 4 4个 0 0
2 2个 5 5个 1 1个
2 2个 6 6个 1 1个

and make the new table like this --并像这样制作新表格 -

transaction交易 day indicator指标
1 1个 1 1个 0 0
1 1个 2 2个 0 0
1 1个 3 3个 0 0
1 1个 4 4个 3 3个
2 2个 1 1个 0 0
2 2个 2 2个 0 0
2 2个 3 3个 0 0
2 2个 4 4个 0 0
2 2个 5 5个 2 2个

Change all day with indicator == 1 to the first day with indicator == 1将指标 == 1 的一整天更改为指标 == 1 的第一天

df%>%
  group_by(transaction)%>%
  mutate(day=case_when(indicator==0~day,
                       T~head(day[indicator==1],1)))%>%
  group_by(transaction,day)%>%
  summarise(indicator=sum(indicator))%>%
  ungroup

  transaction   day indicator
        <int> <int>     <int>
1           1     1         0
2           1     2         0
3           1     3         0
4           1     4         3
5           2     1         0
6           2     2         0
7           2     3         0
8           2     4         0
9           2     5         2

Please try the below code请尝试以下代码

code代码

df <- bind_rows(df1, df2) %>% group_by(transaction) %>% 
mutate(cumsum=cumsum(indicator), cumsum2=ifelse(cumsum==1, day, NA)) %>% 
fill(cumsum2) %>% 
mutate(day=ifelse(!is.na(cumsum2), cumsum2, day)) %>% 
group_by(transaction, day) %>% slice_tail(n=1) %>% select(-cumsum2)

Created on 2023-01-19 with reprex v2.0.2创建于 2023-01-19,使用reprex v2.0.2

output output

# A tibble: 8 × 4
# Groups:   transaction, day [8]
  transaction   day indicator cumsum
        <dbl> <int>     <dbl>  <dbl>
1           1     1         0      0
2           1     2         0      0
3           1     3         0      0
4           1     4         1      3
5           2     1         0      0
6           2     2         0      0
7           2     3         0      0
8           2     4         1      2

Another approach to try.另一种尝试方法。 After grouping by transaction , change indicator to either 0 (same) or the sum of indicator .transaction分组后,将indicator更改为 0(相同)或indicatorsum Finally, keep or filter previous rows where cumall (cumulative all) values for indicator are 0. Using lag will provide the last row containing the sum.最后,保留或filter之前的行,其中indicatorcumall (累计所有)值为 0。使用lag将提供包含总和的最后一行。

library(tidyverse)

df %>%
  group_by(transaction) %>%
  mutate(indicator = ifelse(indicator == 0, 0, sum(indicator))) %>%
  filter(cumall(lag(indicator, default = 0) == 0))

Output Output

  transaction   day indicator
        <int> <int>     <dbl>
1           1     1         0
2           1     2         0
3           1     3         0
4           1     4         3
5           2     1         0
6           2     2         0
7           2     3         0
8           2     4         0
9           2     5         2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM