![](/img/trans.png)
[英]How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
[英]dplyr: mutate new column based on multiple columns selected by variable string
给定此数据:
df=data.frame(
x1=c(2,0,0,NA,0,1,1,NA,0,1),
x2=c(3,2,NA,5,3,2,NA,NA,4,5),
x3=c(0,1,0,1,3,0,NA,NA,0,1),
x4=c(1,0,NA,3,0,0,NA,0,0,1),
x5=c(1,1,NA,1,3,4,NA,3,3,1))
我想使用dplyr为选定列的行min
创建一个额外的列min
。 使用列名很容易:
df <- df %>% rowwise() %>% mutate(min = min(x2,x5))
但是我有一个很大的df,具有不同的列名,因此我需要从一些字符串mycols
匹配它们。 现在其他线程告诉我要使用选择帮助器功能,但是我一定缺少一些东西。 matches
:
mycols <- c("x2","x5")
df <- df %>% rowwise() %>%
mutate(min = min(select(matches(mycols))))
Error: is.string(match) is not TRUE
和one_of
:
mycols <- c("x2","x5")
df <- df %>%
rowwise() %>%
mutate(min = min(select(one_of(mycols))))
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"
In addition: Warning message:
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5`
我在俯视什么? 应该select_
工作? 它不在以下内容中:
df <- df %>%
rowwise() %>%
mutate(min = min(select_(mycols)))
Error: no applicable method for 'select_' applied to an object of class "character"
同样:
df <- df %>%
rowwise() %>%
mutate(min = min(select_(matches(mycols))))
Error: is.string(match) is not TRUE
这是从tidyverse设计用于函数式编程的purrr
软件包的帮助下的另一种技术解决方案。
拳头,来自dplyr
matches
助手使用正则表达式字符串作为参数而不是向量。 这是找到与所有列匹配的正则表达式的好方法。 (在下面的代码中,您可以使用所需的dplyr
select帮助器)
然后,当您了解函数式编程的基本方案时, purrr
函数可与dplyr
一起使用。
解决问题的方法:
df=data.frame(
x1=c(2,0,0,NA,0,1,1,NA,0,1),
x2=c(3,2,NA,5,3,2,NA,NA,4,5),
x3=c(0,1,0,1,3,0,NA,NA,0,1),
x4=c(1,0,NA,3,0,0,NA,0,0,1),
x5=c(1,1,NA,1,3,4,NA,3,3,1))
# regex to get only x2 and x5 column
mycols <- "x[25]"
library(dplyr)
df %>%
mutate(min_x2_x5 =
# select columns that you want in df
select(., matches(mycols)) %>%
# use pmap on this subset to get a vector of min from each row.
# dataframe is a list so pmap works on each element of the list that is to say each row
purrr::pmap_dbl(min)
)
#> x1 x2 x3 x4 x5 min_x2_x5
#> 1 2 3 0 1 1 1
#> 2 0 2 1 0 1 1
#> 3 0 NA 0 NA NA NA
#> 4 NA 5 1 3 1 1
#> 5 0 3 3 0 3 3
#> 6 1 2 0 0 4 2
#> 7 1 NA NA NA NA NA
#> 8 NA NA NA 0 3 NA
#> 9 0 4 0 0 3 3
#> 10 1 5 1 1 1 1
我不会在这里进一步解释有关purrr
信息,但在您的情况下效果很好
这有点棘手。 对于SE评估,您需要将操作作为字符串传递。
mycols <- '(x2,x5)'
f <- paste0('min',mycols)
df %>% rowwise() %>% mutate_(min = f)
df
# A tibble: 10 × 6
# x1 x2 x3 x4 x5 min
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2 3 0 1 1 1
#2 0 2 1 0 1 1
#3 0 NA 0 NA NA NA
#4 NA 5 1 3 1 1
#5 0 3 3 0 3 3
#6 1 2 0 0 4 2
#7 1 NA NA NA NA NA
#8 NA NA NA 0 3 NA
#9 0 4 0 0 3 3
#10 1 5 1 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.