简体   繁体   English

dplyr:根据变量字符串选择的多个列来更改新列

[英]dplyr: mutate new column based on multiple columns selected by variable string

Given this data: 给定此数据:

df=data.frame(
  x1=c(2,0,0,NA,0,1,1,NA,0,1),
  x2=c(3,2,NA,5,3,2,NA,NA,4,5),
  x3=c(0,1,0,1,3,0,NA,NA,0,1),
  x4=c(1,0,NA,3,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,3,4,NA,3,3,1))

I want to create an extra column min for the rowwise minimal value of selected columns using dplyr. 我想使用dplyr为选定列的行min创建一个额外的列min That's easy using the column names: 使用列名很容易:

df <- df %>% rowwise() %>% mutate(min = min(x2,x5))

But I have a large df with varying column names so I need to match them from some string of values mycols . 但是我有一个很大的df,具有不同的列名,因此我需要从一些字符串mycols匹配它们。 Now other threads tell me to use select helper functions, but I must be missing something. 现在其他线程告诉我要使用选择帮助器功能,但是我一定缺少一些东西。 Here's matches : matches

mycols <- c("x2","x5")
df <- df %>% rowwise() %>%
  mutate(min = min(select(matches(mycols))))
Error: is.string(match) is not TRUE

And one_of : one_of

mycols <- c("x2","x5")
 df <- df %>%
 rowwise() %>%
 mutate(min = min(select(one_of(mycols))))
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"
In addition: Warning message:
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5`

What am I overlooking? 我在俯视什么? Should select_ work? 应该select_工作? It doesn't in the following: 它不在以下内容中:

df <- df %>%
   rowwise() %>%
   mutate(min = min(select_(mycols)))
Error: no applicable method for 'select_' applied to an object of class "character"

And likewise: 同样:

df <- df %>%
  rowwise() %>%
  mutate(min = min(select_(matches(mycols))))
Error: is.string(match) is not TRUE

Here's another solution a bit technical with the help of purrr package from the tidyverse designed for functional programming. 这是从tidyverse设计用于函数式编程的purrr软件包的帮助下的另一种技术解决方案。

Fist, matches helpers from dplyr takes a regex string as argument not a vector. 拳头,来自dplyr matches助手使用正则表达式字符串作为参数而不是向量。 It is a good way for you to find a regex that matches all your columns. 这是找到与所有列匹配的正则表达式的好方法。 (in the code under you can use the dplyr select helper that you wish) (在下面的代码中,您可以使用所需的dplyr select帮助器)

Then, purrr functions works great with dplyr when you understand the underlying scheme of functionnal programming. 然后,当您了解函数式编程的基本方案时, purrr函数可与dplyr一起使用。

Solution to your problem : 解决问题的方法:


df=data.frame(
  x1=c(2,0,0,NA,0,1,1,NA,0,1),
  x2=c(3,2,NA,5,3,2,NA,NA,4,5),
  x3=c(0,1,0,1,3,0,NA,NA,0,1),
  x4=c(1,0,NA,3,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,3,4,NA,3,3,1))


# regex to get only x2 and x5 column
mycols <- "x[25]"

library(dplyr)

df %>%
  mutate(min_x2_x5 =
           # select columns that you want in df
           select(., matches(mycols)) %>% 
           # use pmap on this subset to get a vector of min from each row.
           # dataframe is a list so pmap works on each element of the list that is to say each row
           purrr::pmap_dbl(min)
         )
#>    x1 x2 x3 x4 x5 min_x2_x5
#> 1   2  3  0  1  1         1
#> 2   0  2  1  0  1         1
#> 3   0 NA  0 NA NA        NA
#> 4  NA  5  1  3  1         1
#> 5   0  3  3  0  3         3
#> 6   1  2  0  0  4         2
#> 7   1 NA NA NA NA        NA
#> 8  NA NA NA  0  3        NA
#> 9   0  4  0  0  3         3
#> 10  1  5  1  1  1         1

I won't explain further about purrr here but it works fine in your case 我不会在这里进一步解释有关purrr信息,但在您的情况下效果很好

This was a bit trickier. 这有点棘手。 In case of SE evaluation you'd need to pass the operation as string. 对于SE评估,您需要将操作作为字符串传递。

mycols <- '(x2,x5)'
f <- paste0('min',mycols)
df %>% rowwise() %>% mutate_(min = f)
df
# A tibble: 10 × 6
#      x1    x2    x3    x4    x5   min
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1      2     3     0     1     1     1
#2      0     2     1     0     1     1
#3      0    NA     0    NA    NA    NA
#4     NA     5     1     3     1     1
#5      0     3     3     0     3     3
#6      1     2     0     0     4     2
#7      1    NA    NA    NA    NA    NA
#8     NA    NA    NA     0     3    NA
#9      0     4     0     0     3     3
#10     1     5     1     1     1     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 purrr 中的 map 和 dplyr::mutate 根据列对创建多个新列 - How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs 使用 dplyr::mutate 根据字符串向量(或 tidyselect)传递的多个条件和相应的变量名称创建新变量 - Creating new variable with dplyr::mutate based on multiple conditions and corresponding variable names passed by string vector (or tidyselect) 使用dplyr mutate根据列名向量创建新列 - use dplyr mutate to create new columns based on a vector of column names dplyr根据向量中的名称更改多列 - dplyr mutate multiple columns based on names in vectors 在基于多列的条件下使用dplyr mutate - Using dplyr mutate with conditions based on multiple columns 如何根据多个现有列中的数据对新列使用 mutate - How to use mutate for a new column based the data in multiple existing columns 如何使用 dplyr 将多列变异为新的多列 - How to mutate multiple columns into new multiple columns with dplyr 使用 dplyr 基于 R 中的其他两列自定义变异新列 - Custom mutate new column based on two other columns in R using dplyr R dplyr使用自定义函数变异多列来创建新列 - R dplyr mutate multiple columns using custom function to create new column 根据该列中的滞后值改变新列 - dplyr 方法 - Mutate a new column based on lagged values within that column - dplyr approach
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM