简体   繁体   English

使用 dplyr 减去几个不同列的最有效方法是什么

[英]What is the most efficient way to subtract several different columns using dplyr

I have a dataset like this:我有一个这样的数据集:

data.frame(x = c(1:5), y = c(0:4), z = c(2:6))

  x y z
1 1 0 2
2 2 1 3
3 3 2 4
4 4 3 5
5 5 4 6

I would like to get a dataset like this:我想得到这样的数据集:

  x y z y-x z-y
1 1 0 2  -1   2
2 2 1 3  -1   2
3 3 2 4  -1   2
4 4 3 5  -1   2
5 5 4 6  -1   2

when I use:当我使用:

a <- a %>% mutate(across((x:z), ~. - lag(.)))

I get:我得到:

   x  y  z
1 NA NA NA
2  1  1  1
3  1  1  1
4  1  1  1
5  1  1  1

That is, the mutate is subtracting in the same column and I needed to subtract in different columns.也就是说,变异是在同一列中减去,我需要在不同的列中减去。 How can I resolve this?我该如何解决这个问题?

I wouldn't use dplyr for this.我不会为此使用dplyr I would use base R directly:我会直接使用 base R:

diff_cols = your_data[-1] - your_data[-ncol(your_data)]
names(diff_cols) = paste0(
  names(your_data)[-1],
  "-",
  names(your_data)[-ncol(your_data)]
)
cbind(your_data, diff_cols)
#   x y z y-x z-y
# 1 1 0 2  -1   2
# 2 2 1 3  -1   2
# 3 3 2 4  -1   2
# 4 4 3 5  -1   2
# 5 5 4 6  -1   2

Using dplyr you could do this:使用dplyr你可以这样做:

library(dplyr, warn.conflicts = FALSE)


df1 <- data.frame(x = c(1:5), y = c(0:4), z = c(2:6))

df1 |> 
  mutate(`y-x` = y - x,
         `z-y` = z - y)
#> # A tibble: 5 × 5
#> # Rowwise: 
#>       x     y     z `y-x` `z-y`
#>   <int> <int> <int> <int> <int>
#> 1     1     0     2    -1     2
#> 2     2     1     3    -1     2
#> 3     3     2     4    -1     2
#> 4     4     3     5    -1     2
#> 5     5     4     6    -1     2

Created on 2022-12-27 with reprex v2.0.2创建于 2022-12-27,使用reprex v2.0.2

You could use something like你可以使用类似的东西

library(dplyr)

df %>% 
  mutate(across(x:y, 
                ~. - df[[names(df)[which(names(df) == cur_column()) + 1]]],
                .names = "{.col}-{names(df)[which(names(df) == .col) + 1]}")
         )

This returns这返回

  x y z x-y y-z
1 1 0 2   1  -2
2 2 1 3   1  -2
3 3 2 4   1  -2
4 4 3 5   1  -2
5 5 4 6   1  -2
Warning message:
Problem while computing `..1 = across(...)`.
ℹ longer object length is not a multiple of shorter object length 

but casts a warning which I can't remove.但发出了一个我无法删除的警告。

Here's a tidyr::pivot_longer + dplyr approach.这是 tidyr::pivot_longer + dplyr 方法。 The same code should work for any number of columns.相同的代码应该适用于任意数量的列。

df1 <- data.frame(x = c(1:5), y = c(0:4), z = c(2:6)) %>%
  mutate(row = row_number()) %>%
  pivot_longer(-row)

bind_rows(df1, 
  df1 %>%
    group_by(row) %>%
    mutate(name = paste0(name, "-", lag(name)), value = value - lag(value)) %>%
    ungroup() %>% filter(!is.na(value))) %>%
  pivot_wider(names_from = name, values_from = value)

Result结果

# A tibble: 5 × 6
    row     x     y     z `y-x` `z-y`
  <int> <int> <int> <int> <int> <int>
1     1     1     0     2    -1     2
2     2     2     1     3    -1     2
3     3     3     2     4    -1     2
4     4     4     3     5    -1     2
5     5     5     4     6    -1     2

We may use across2我们可以使用across2

library(dplyover)
a %>% 
  mutate(across2(y:z, x:y, `-`))
  x y z y_x z_y
1 1 0 2  -1   2
2 2 1 3  -1   2
3 3 2 4  -1   2
4 4 3 5  -1   2
5 5 4 6  -1   2

If the column name should be - instead of _ ,如果列名应该是-而不是_

a %>% 
  mutate(across2(y:z, x:y, `-`, .names = "{xcol}-{ycol}"))
  x y z y-x z-y
1 1 0 2  -1   2
2 2 1 3  -1   2
3 3 2 4  -1   2
4 4 3 5  -1   2
5 5 4 6  -1   2

Or with dplyr using two across或者dplyr使用两个across

library(dplyr)
 a %>%
  mutate(across(y:z, .names = "{.col}-{names(a)[match(.col, names(a))-1]}") -
       across(x:y))

-output -输出

  x y z y-x z-y
1 1 0 2  -1   2
2 2 1 3  -1   2
3 3 2 4  -1   2
4 4 3 5  -1   2
5 5 4 6  -1   2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 R 的 tidyverse,过滤出满足多列条件的数据的最有效方法是什么? - Using R's tidyverse, what is the most efficient way to filter out data that meet conditions across multiple columns? 使用 dplyr 对具有不同功能的不同列进行有效汇总 - Efficient summarise of different columns with different functions with dplyr 在 R 中删除数据表中的空列的最有效方法是什么 - What is the most efficient way to remove empty columns in a datatable in R 使用 dplyr 生成带滚动窗口的多列产品 - Product of several columns with rolling window using dplyr 以最有效的方式确保 dplyr::summarise() 中的唯一值 - Ensuring unique values in dplyr::summarise() in most efficient way 使用 dplyr 进行条件变异的最安全和最有效的方法 - Safest and most efficient way to do a conditional mutate with dplyr dplyr | group_by vs anti_join | 最有效的方法 - dplyr | group_by vs anti_join | most efficient way 使用 base 或 dplyr 在几列上过滤 dataframe - filter dataframe on several columns using base or dplyr 基于分组 dataframe 使用 ZE28396D3D40DZAF17 中的 dplyr 创建具有多个汇总列的 dataframe 的有效方法 - Efficient way to create a dataframe with multiple summary columns based on a grouped dataframe using dplyr in R 对名称以模式开头的所有列求和的最有效方法是什么? - What is the most efficient way to sum all columns whose name starts with a pattern?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM