計算所有行之間差異的最有效方法

Question

我有一個帶有以下數據的 R data.frame：

# A tibble: 21 x 57
# Groups:   section [21]
   section `1965` `1966` `1967` `1968` `1969`
   <fct>  <int>  <int>  <int>  <int>  <int>
 1 A          3     63    114    173    257
 2 B          2     88    114    147    169
 3 C         26    708    892   1101   1339
 4 D          1     16     16     20     77

在完整的 data.frame 中，列的范圍是 1965->2020，每一行都是 A->U 部分。

我想在右側添加新列，其中連續列之間的差異：每個部分（行2020-2019的 1966-1965 數據，然后是每行的1967-1966等等，直到1968-1967作為最后一個新的柱子。

我嘗試了一些mutate_all()的實現，但沒有成功。

任何建議都非常感謝！

干杯

Answer 1

我們無法t置數據，獲取diff

cbind(df, t(diff(t(df[-1]))))
#  section 1965 1966 1967 1968 1969 1966 1967 1968 1969
#1       A    3   63  114  173  257   60   51   59   84
#2       B    2   88  114  147  169   86   26   33   22
#3       C   26  708  892 1101 1339  682  184  209  238
#4       D    1   16   16   20   77   15    0    4   57

數據

df <- structure(list(section = c("A", "B", "C", "D"), `1965` = c(3L, 
2L, 26L, 1L), `1966` = c(63L, 88L, 708L, 16L), `1967` = c(114L, 
114L, 892L, 16L), `1968` = c(173L, 147L, 1101L, 20L), `1969` = c(257L, 
169L, 1339L, 77L)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

Answer 2

您可以使用apply來cbind diff結果粘貼到右側：

result <- cbind(df, t(apply(df[-1], 1, diff)))
result
#>   section 1965 1966 1967 1968 1969 1966 1967 1968 1969
#> 1       A    3   63  114  173  257   60   51   59   84
#> 2       B    2   88  114  147  169   86   26   33   22
#> 3       C   26  708  892 1101 1339  682  184  209  238
#> 4       D    1   16   16   20   77   15    0    4   57

當然，之后您需要根據需要更改名稱：

names(result)[7:10] <- paste(1965:1968, 1966:1969, sep = "_")

as_tibble(result)
#> # A tibble: 4 x 10
#>   section `1965` `1966` `1967` `1968` `1969` `1965_1966` `1966_1967` `1967_1968`
#>   <chr>    <int>  <int>  <int>  <int>  <int>       <int>       <int>       <int>
#> 1 A            3     63    114    173    257          60          51          59
#> 2 B            2     88    114    147    169          86          26          33
#> 3 C           26    708    892   1101   1339         682         184         209
#> 4 D            1     16     16     20     77          15           0           4
#> # ... with 1 more variable: `1968_1969` <int>

Answer 3

您可以在 dplyr 中使用c_across() ，在dplyr中使用tidyr ( unnest_wider() 。

library(dplyr)
library(tidyr)

df %>%
  rowwise() %>%
  mutate(x = list(diff(c_across(`1965`:`1969`)))) %>%
  unnest_wider(x)

# # A tibble: 4 x 10
#   section `1965` `1966` `1967` `1968` `1969`  ...1  ...2  ...3  ...4
#   <chr>    <int>  <int>  <int>  <int>  <int> <int> <int> <int> <int>
# 1 A            3     63    114    173    257    60    51    59    84
# 2 B            2     88    114    147    169    86    26    33    22
# 3 C           26    708    892   1101   1339   682   184   209   238
# 4 D            1     16     16     20     77    15     0     4    57

Answer 4

這是另一個使用矩陣產品的基本 R 選項

m <- -diag(ncol(df)-1)
m[cbind(2:ncol(m),1:(ncol(m)-1))]<-1
dfout <- cbind(df,as.matrix(df[-1])%*%m[,-ncol(m)])

這使

> dfout
  section `1965` `1966` `1967` `1968` `1969`   1   2   3   4
1       A      3     63    114    173    257  60  51  59  84
2       B      2     88    114    147    169  86  26  33  22
3       C     26    708    892   1101   1339 682 184 209 238
4       D      1     16     16     20     77  15   0   4  57

計算所有行之間差異的最有效方法

問題描述

4 個解決方案

解決方案1
3 2020-07-15 17:28:18

數據

解決方案2
1 已采納 2020-07-15 16:11:41

解決方案3
1 2020-07-15 16:34:25

解決方案4
1 2020-07-15 19:24:45

計算所有行之間差異的最有效方法

問題描述

4 個解決方案

解決方案1 3 2020-07-15 17:28:18

數據

解決方案2 1 已采納 2020-07-15 16:11:41

解決方案3 1 2020-07-15 16:34:25

解決方案4 1 2020-07-15 19:24:45

解決方案1
3 2020-07-15 17:28:18

解決方案2
1 已采納 2020-07-15 16:11:41

解決方案3
1 2020-07-15 16:34:25

解決方案4
1 2020-07-15 19:24:45