Most efficient way to calculate difference between all rows

Question

I have an R data.frame with the following data:

# A tibble: 21 x 57
# Groups:   section [21]
   section `1965` `1966` `1967` `1968` `1969`
   <fct>  <int>  <int>  <int>  <int>  <int>
 1 A          3     63    114    173    257
 2 B          2     88    114    147    169
 3 C         26    708    892   1101   1339
 4 D          1     16     16     20     77

In the complete data.frame the columns range from 1965->2020 and each row is a section A->U.

I would like to add new columns to the right with the difference between successive columns: 1966-1965 data for each Section (Row), then 1967-1966 for each row, 1968-1967 and so on until 2020-2019 as the last new column.

I have tried a few implementations of mutate_all() but to no success.

Any suggestion is highly appreciated!

Cheers

Answer 1

We can t ranspose the data, get the diff

cbind(df, t(diff(t(df[-1]))))
#  section 1965 1966 1967 1968 1969 1966 1967 1968 1969
#1       A    3   63  114  173  257   60   51   59   84
#2       B    2   88  114  147  169   86   26   33   22
#3       C   26  708  892 1101 1339  682  184  209  238
#4       D    1   16   16   20   77   15    0    4   57

data

df <- structure(list(section = c("A", "B", "C", "D"), `1965` = c(3L, 
2L, 26L, 1L), `1966` = c(63L, 88L, 708L, 16L), `1967` = c(114L, 
114L, 892L, 16L), `1968` = c(173L, 147L, 1101L, 20L), `1969` = c(257L, 
169L, 1339L, 77L)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

Answer 2

You can use apply to diff all the rows, then stick the result on to the right with cbind :

result <- cbind(df, t(apply(df[-1], 1, diff)))
result
#>   section 1965 1966 1967 1968 1969 1966 1967 1968 1969
#> 1       A    3   63  114  173  257   60   51   59   84
#> 2       B    2   88  114  147  169   86   26   33   22
#> 3       C   26  708  892 1101 1339  682  184  209  238
#> 4       D    1   16   16   20   77   15    0    4   57

Of course, you'll want to change the names as appropriate afterwards:

names(result)[7:10] <- paste(1965:1968, 1966:1969, sep = "_")

as_tibble(result)
#> # A tibble: 4 x 10
#>   section `1965` `1966` `1967` `1968` `1969` `1965_1966` `1966_1967` `1967_1968`
#>   <chr>    <int>  <int>  <int>  <int>  <int>       <int>       <int>       <int>
#> 1 A            3     63    114    173    257          60          51          59
#> 2 B            2     88    114    147    169          86          26          33
#> 3 C           26    708    892   1101   1339         682         184         209
#> 4 D            1     16     16     20     77          15           0           4
#> # ... with 1 more variable: `1968_1969` <int>

Answer 3

You can use c_across() in dplyr and unnest_wider() in tidyr .

library(dplyr)
library(tidyr)

df %>%
  rowwise() %>%
  mutate(x = list(diff(c_across(`1965`:`1969`)))) %>%
  unnest_wider(x)

# # A tibble: 4 x 10
#   section `1965` `1966` `1967` `1968` `1969`  ...1  ...2  ...3  ...4
#   <chr>    <int>  <int>  <int>  <int>  <int> <int> <int> <int> <int>
# 1 A            3     63    114    173    257    60    51    59    84
# 2 B            2     88    114    147    169    86    26    33    22
# 3 C           26    708    892   1101   1339   682   184   209   238
# 4 D            1     16     16     20     77    15     0     4    57

Answer 4

Here is another base R option which used matrix product

m <- -diag(ncol(df)-1)
m[cbind(2:ncol(m),1:(ncol(m)-1))]<-1
dfout <- cbind(df,as.matrix(df[-1])%*%m[,-ncol(m)])

which gives

> dfout
  section `1965` `1966` `1967` `1968` `1969`   1   2   3   4
1       A      3     63    114    173    257  60  51  59  84
2       B      2     88    114    147    169  86  26  33  22
3       C     26    708    892   1101   1339 682 184 209 238
4       D      1     16     16     20     77  15   0   4  57

Most efficient way to calculate difference between all rows

Question

4 answers

solution1
3 2020-07-15 17:28:18

data

solution2
1 ACCPTED 2020-07-15 16:11:41

solution3
1 2020-07-15 16:34:25

solution4
1 2020-07-15 19:24:45

Most efficient way to calculate difference between all rows

Question

4 answers

solution1 3 2020-07-15 17:28:18

data

solution2 1 ACCPTED 2020-07-15 16:11:41

solution3 1 2020-07-15 16:34:25

solution4 1 2020-07-15 19:24:45

solution1
3 2020-07-15 17:28:18

solution2
1 ACCPTED 2020-07-15 16:11:41

solution3
1 2020-07-15 16:34:25

solution4
1 2020-07-15 19:24:45