[英]dplyr: mutate new column based on multiple columns selected by variable string
Given this data: 给定此数据:
df=data.frame(
x1=c(2,0,0,NA,0,1,1,NA,0,1),
x2=c(3,2,NA,5,3,2,NA,NA,4,5),
x3=c(0,1,0,1,3,0,NA,NA,0,1),
x4=c(1,0,NA,3,0,0,NA,0,0,1),
x5=c(1,1,NA,1,3,4,NA,3,3,1))
I want to create an extra column min
for the rowwise minimal value of selected columns using dplyr. 我想使用dplyr为选定列的行
min
创建一个额外的列min
。 That's easy using the column names: 使用列名很容易:
df <- df %>% rowwise() %>% mutate(min = min(x2,x5))
But I have a large df with varying column names so I need to match them from some string of values mycols
. 但是我有一个很大的df,具有不同的列名,因此我需要从一些字符串
mycols
匹配它们。 Now other threads tell me to use select helper functions, but I must be missing something. 现在其他线程告诉我要使用选择帮助器功能,但是我一定缺少一些东西。 Here's
matches
: matches
:
mycols <- c("x2","x5")
df <- df %>% rowwise() %>%
mutate(min = min(select(matches(mycols))))
Error: is.string(match) is not TRUE
And one_of
: 和
one_of
:
mycols <- c("x2","x5")
df <- df %>%
rowwise() %>%
mutate(min = min(select(one_of(mycols))))
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"
In addition: Warning message:
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5`
What am I overlooking? 我在俯视什么? Should
select_
work? 应该
select_
工作? It doesn't in the following: 它不在以下内容中:
df <- df %>%
rowwise() %>%
mutate(min = min(select_(mycols)))
Error: no applicable method for 'select_' applied to an object of class "character"
And likewise: 同样:
df <- df %>%
rowwise() %>%
mutate(min = min(select_(matches(mycols))))
Error: is.string(match) is not TRUE
Here's another solution a bit technical with the help of purrr
package from the tidyverse designed for functional programming. 这是从tidyverse设计用于函数式编程的
purrr
软件包的帮助下的另一种技术解决方案。
Fist, matches
helpers from dplyr
takes a regex string as argument not a vector. 拳头,来自
dplyr
matches
助手使用正则表达式字符串作为参数而不是向量。 It is a good way for you to find a regex that matches all your columns. 这是找到与所有列匹配的正则表达式的好方法。 (in the code under you can use the
dplyr
select helper that you wish) (在下面的代码中,您可以使用所需的
dplyr
select帮助器)
Then, purrr
functions works great with dplyr
when you understand the underlying scheme of functionnal programming. 然后,当您了解函数式编程的基本方案时,
purrr
函数可与dplyr
一起使用。
Solution to your problem : 解决问题的方法:
df=data.frame(
x1=c(2,0,0,NA,0,1,1,NA,0,1),
x2=c(3,2,NA,5,3,2,NA,NA,4,5),
x3=c(0,1,0,1,3,0,NA,NA,0,1),
x4=c(1,0,NA,3,0,0,NA,0,0,1),
x5=c(1,1,NA,1,3,4,NA,3,3,1))
# regex to get only x2 and x5 column
mycols <- "x[25]"
library(dplyr)
df %>%
mutate(min_x2_x5 =
# select columns that you want in df
select(., matches(mycols)) %>%
# use pmap on this subset to get a vector of min from each row.
# dataframe is a list so pmap works on each element of the list that is to say each row
purrr::pmap_dbl(min)
)
#> x1 x2 x3 x4 x5 min_x2_x5
#> 1 2 3 0 1 1 1
#> 2 0 2 1 0 1 1
#> 3 0 NA 0 NA NA NA
#> 4 NA 5 1 3 1 1
#> 5 0 3 3 0 3 3
#> 6 1 2 0 0 4 2
#> 7 1 NA NA NA NA NA
#> 8 NA NA NA 0 3 NA
#> 9 0 4 0 0 3 3
#> 10 1 5 1 1 1 1
I won't explain further about purrr
here but it works fine in your case 我不会在这里进一步解释有关
purrr
信息,但在您的情况下效果很好
This was a bit trickier. 这有点棘手。 In case of SE evaluation you'd need to pass the operation as string.
对于SE评估,您需要将操作作为字符串传递。
mycols <- '(x2,x5)'
f <- paste0('min',mycols)
df %>% rowwise() %>% mutate_(min = f)
df
# A tibble: 10 × 6
# x1 x2 x3 x4 x5 min
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2 3 0 1 1 1
#2 0 2 1 0 1 1
#3 0 NA 0 NA NA NA
#4 NA 5 1 3 1 1
#5 0 3 3 0 3 3
#6 1 2 0 0 4 2
#7 1 NA NA NA NA NA
#8 NA NA NA 0 3 NA
#9 0 4 0 0 3 3
#10 1 5 1 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.