[英]Most efficient way to calculate difference between all rows
I have an R data.frame with the following data:我有一个带有以下数据的 R data.frame:
# A tibble: 21 x 57
# Groups: section [21]
section `1965` `1966` `1967` `1968` `1969`
<fct> <int> <int> <int> <int> <int>
1 A 3 63 114 173 257
2 B 2 88 114 147 169
3 C 26 708 892 1101 1339
4 D 1 16 16 20 77
In the complete data.frame the columns range from 1965->2020 and each row is a section A->U.在完整的 data.frame 中,列的范围是 1965->2020,每一行都是 A->U 部分。
I would like to add new columns to the right with the difference between successive columns: 1966-1965 data for each Section (Row), then 1967-1966
for each row, 1968-1967
and so on until 2020-2019
as the last new column.我想在右侧添加新列,其中连续列之间的差异:每个部分(行
2020-2019
的 1966-1965 数据,然后是每行的1967-1966
等等,直到1968-1967
作为最后一个新的柱子。
I have tried a few implementations of mutate_all()
but to no success.我尝试了一些
mutate_all()
的实现,但没有成功。
Any suggestion is highly appreciated!任何建议都非常感谢!
Cheers干杯
We can t
ranspose the data, get the diff
我们无法
t
置数据,获取diff
cbind(df, t(diff(t(df[-1]))))
# section 1965 1966 1967 1968 1969 1966 1967 1968 1969
#1 A 3 63 114 173 257 60 51 59 84
#2 B 2 88 114 147 169 86 26 33 22
#3 C 26 708 892 1101 1339 682 184 209 238
#4 D 1 16 16 20 77 15 0 4 57
df <- structure(list(section = c("A", "B", "C", "D"), `1965` = c(3L,
2L, 26L, 1L), `1966` = c(63L, 88L, 708L, 16L), `1967` = c(114L,
114L, 892L, 16L), `1968` = c(173L, 147L, 1101L, 20L), `1969` = c(257L,
169L, 1339L, 77L)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
You can use apply
to diff
all the rows, then stick the result on to the right with cbind
:您可以使用
apply
来cbind
diff
结果粘贴到右侧:
result <- cbind(df, t(apply(df[-1], 1, diff)))
result
#> section 1965 1966 1967 1968 1969 1966 1967 1968 1969
#> 1 A 3 63 114 173 257 60 51 59 84
#> 2 B 2 88 114 147 169 86 26 33 22
#> 3 C 26 708 892 1101 1339 682 184 209 238
#> 4 D 1 16 16 20 77 15 0 4 57
Of course, you'll want to change the names as appropriate afterwards:当然,之后您需要根据需要更改名称:
names(result)[7:10] <- paste(1965:1968, 1966:1969, sep = "_")
as_tibble(result)
#> # A tibble: 4 x 10
#> section `1965` `1966` `1967` `1968` `1969` `1965_1966` `1966_1967` `1967_1968`
#> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 A 3 63 114 173 257 60 51 59
#> 2 B 2 88 114 147 169 86 26 33
#> 3 C 26 708 892 1101 1339 682 184 209
#> 4 D 1 16 16 20 77 15 0 4
#> # ... with 1 more variable: `1968_1969` <int>
You can use c_across()
in dplyr
and unnest_wider()
in tidyr
.您可以在 dplyr 中使用
c_across()
,在dplyr
中使用tidyr
( unnest_wider()
。
library(dplyr)
library(tidyr)
df %>%
rowwise() %>%
mutate(x = list(diff(c_across(`1965`:`1969`)))) %>%
unnest_wider(x)
# # A tibble: 4 x 10
# section `1965` `1966` `1967` `1968` `1969` ...1 ...2 ...3 ...4
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 A 3 63 114 173 257 60 51 59 84
# 2 B 2 88 114 147 169 86 26 33 22
# 3 C 26 708 892 1101 1339 682 184 209 238
# 4 D 1 16 16 20 77 15 0 4 57
Here is another base R option which used matrix product这是另一个使用矩阵产品的基本 R 选项
m <- -diag(ncol(df)-1)
m[cbind(2:ncol(m),1:(ncol(m)-1))]<-1
dfout <- cbind(df,as.matrix(df[-1])%*%m[,-ncol(m)])
which gives这使
> dfout
section `1965` `1966` `1967` `1968` `1969` 1 2 3 4
1 A 3 63 114 173 257 60 51 59 84
2 B 2 88 114 147 169 86 26 33 22
3 C 26 708 892 1101 1339 682 184 209 238
4 D 1 16 16 20 77 15 0 4 57
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.