简体   繁体   中英

Cumulative sum of rows in R from one column name until another column name, then divide cells by another value

Hi I have a dataframe in R with this structure called cohort_table3

New_Sign_ups   Spend   2021-05  2021-06      2021-07  MRR_sum  time
100             100       0       10         100       110     1.5
50              200       10      10          40        60     1.8

what I need first is to do a cumulative sum by row to start from a column index and end at the column we index again

for example in this case will be do a cumulative sum from column 2021-05 (number 3) to column 2022-07 (number 5) so the final result would be

New_Sign_ups   Spend   2021-05  2021-06      2021-07  MRR_sum  time
100             100       0       10         110       110     1.5
50              200       10      20          60        60     1.8

then the last step would be to divide all the cumulative sum values by the Spend of that row in this case the final value would be

New_Sign_ups   Spend   2021-05  2021-06      2021-07  MRR_sum  time
100             100       0       0.1         1.1       110     1.5
50              200       0.05    0.1         0.3        60     1.8
dat <- structure(list(New_Sign_ups = c(100L, 50L), Spend = c(100L, 200L
), `2021-05` = c(0L, 10L), `2021-06` = c(10L, 10L), `2021-07` = c(100L, 
40L), MRR_sum = c(110L, 60L), time = c(1.5, 1.8)), class = "data.frame", 
row.names = c(NA, -2L))

Without using any packages

dat[3:5] <- do.call("cbind", Reduce(`+`, dat[3:5], accumulate = TRUE)) / dat$Spend
#  New_Sign_ups Spend 2021-05 2021-06 2021-07 MRR_sum time
#1          100   100    0.00     0.1     1.1     110  1.5
#2           50   200    0.05     0.1     0.3      60  1.8

Using package matrixStats

library(matrixStats)
dat[3:5] <- rowCumsums(as.matrix(dat[3:5])) / dat$Spend
#  New_Sign_ups Spend 2021-05 2021-06 2021-07 MRR_sum time
#1          100   100    0.00     0.1     1.1     110  1.5
#2           50   200    0.05     0.1     0.3      60  1.8

Here is also a tidyverse solution with mostly purrr package functions. Thank you Zheyuan Li for the data:

library(dplyr)
library(purrr)

dat %>%
  mutate(pmap_df(dat, ~ {
    accumulate(c(...)[str_detect(names(dat), '\\d+')], `+`) / c(...)['Spend' == names(dat)]}))

  New_Sign_ups Spend 2021-05 2021-06 2021-07 MRR_sum time
1          100   100    0.00     0.1     1.1     110  1.5
2           50   200    0.05     0.1     0.3      60  1.8
  • We can use rowwise paired with c_across from dplyr library, then update the values of the data.frame
library(dplyr)

df <- cohort_table3 |> rowwise() |> 
      mutate(ans = list(cumsum(c_across(3:5))/Spend))

cohort_table3[3:5] <- do.call(rbind , df$ans)
  • output
 New_Sign_ups Spend X2021.05 X2021.06 X2021.07 MRR_sum time
1          100   100     0.00      0.1      1.1     110  1.5
2           50   200     0.05      0.1      0.3      60  1.8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM