简体   繁体   中英

R Dplyr; calculating difference between two columns from previous row but putting result in next row without for loop

I am trying to solve the following problem in which I am looking to calculate the difference between two columns from the previous row on the next row using dplyr in R, preferably without the use of a loop. In this specific example, I want to calculate r_j - s_j from the previous row but then paste the result in the next row.

Here is some sample data:

require(tidyverse)
data = tibble(LM = c(100, 300, 400, 500, 600, 700, 800, 1300), s_j = c(2,2,2,1,2,2,1,1)) %>% 
       bind_cols(,r_j = rep(25, nrow(.))

     LM   s_j   r_j
1   100     2    25
2   300     2    25
3   400     2    25
4   500     1    25
5   600     2    25
6   700     2    25
7   800     1    25
8  1300     1    25

My desired output is this;

     LM   s_j   r_j
1   100     2    25
2   300     2    23
3   400     2    21
4   500     1    19
5   600     2    18
6   700     2    16
7   800     1    14
8  1300     1    13

A solution to this problem is:

for (k in 2:nrow(data)){ 
   tmp = data$r_j[k-1] - data$s_j[k-1]
   data$r_j[k] = tmp 
}

which yields

     LM   s_j   r_j
1   100     2    25
2   300     2    23
3   400     2    21
4   500     1    19
5   600     2    18
6   700     2    16
7   800     1    14
8  1300     1    13

but surely there exists a much better solution than the for loop in R? Thanks for any help.

One way is to generate the cumulative sum of s_j and then subtract that from r_j

data %>% mutate(
    temp = cumsum(s_j),
    r_j2 = r_j-temp
)
# A tibble: 8 x 5
     LM   s_j   r_j  temp  r_j2
   <dbl> <dbl> <dbl> <dbl> <dbl>
1   100     2    25     2    23
2   300     2    25     4    21
3   400     2    25     6    19
4   500     1    25     7    18
5   600     2    25     9    16
6   700     2    25    11    14
7   800     1    25    12    13
8  1300     1    25    13    12

EDIT: To generate the exact output desired one can subtract the value of s_j from its cumsum and get the following:

data %>% mutate(
     temp = cumsum(s_j)-s_j,
     r_j2 = r_j-temp
 )
# A tibble: 8 x 5
     LM   s_j   r_j  temp  r_j2
  <dbl> <dbl> <dbl> <dbl> <dbl>
1   100     2    25     0    25
2   300     2    25     2    23
3   400     2    25     4    21
4   500     1    25     6    19
5   600     2    25     7    18
6   700     2    25     9    16
7   800     1    25    11    14
8  1300     1    25    12    13

EDIT2: Including the solution by IceCreamToucan which does not need to generate a temp column:

data %>% mutate(
     r_j2 = coalesce(lag(r_j - cumsum(s_j)), r_j)
     )
# A tibble: 8 x 4
     LM   s_j   r_j  r_j2
  <dbl> <dbl> <dbl> <dbl>
1   100     2    25    25
2   300     2    25    23
3   400     2    25    21
4   500     1    25    19
5   600     2    25    18
6   700     2    25    16
7   800     1    25    14
8  1300     1    25    13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM