[英]Apply function over 2 character vectors return list - purrr
I'm doing some work using purrr
and hoping for a complete piped solution for this problem. 我正在使用purrr
做一些工作,并希望针对此问题寻求完整的管道解决方案。 I am using sapply
but think this isn't the optimal solution. 我正在使用sapply
但认为这不是最佳解决方案。 It works for this small demo, but in the real data ch1 is length >50,000 and ch2 is >100. 它适用于此小型演示,但实际数据中ch1的长度大于50,000,而ch2的大于100。
library(stringr)
library(purrr)
ch1 <- c("something very interesting or perhaps it is not", "lions, tigers and elephants are safari animals", "once upon a time there was a big castle",
"I have not seen anything as a big as elephants")
ch2 <- c("big", "not")
For each element of ch2
we want to see if they occur in each element of ch1
. 对于ch2
每个元素,我们想查看它们是否出现在ch1
每个元素中。
str_detect(ch1, ch2[1]) # FALSE FALSE TRUE TRUE
str_detect(ch1, ch2[2]) # TRUE FALSE FALSE TRUE
Trying to use purrr
to apply function over all of ch1
: 尝试使用purrr
将功能应用于所有ch1
:
ch1 %>% map_lgl(str_detect(., ch2[2])) # TRUE FALSE FALSE TRUE
I can do this for the entirety of ch2
using sapply
: 我可以使用sapply
在整个ch2
做到这sapply
:
sapply(ch2, function(x) ch1 %>% map_lgl(str_detect(., x)))
big not
[1,] FALSE TRUE
[2,] FALSE FALSE
[3,] TRUE FALSE
[4,] TRUE TRUE
However, with the real dataset I think there must be a full purrr
solution - something like using map2
ie working on two lists - but obviously it can't be that particular one as it requires lists of equal lengths. 但是,对于真实的数据集,我认为必须有一个完整的purrr
解决方案-类似于使用map2
即在两个列表上工作-但显然它不是那个特定的,因为它需要等长列表。
The following, which would probably be a bit faster on large data sets than the code in your post, returns a list of vectors. 下面的命令在大型数据集上可能比帖子中的代码要快一些,它返回向量列表。
library(stringr)
library(purrr
lst <- ch2 %>% split(ch2) %>%
map( ~ str_detect(ch1, .x))
To return a matrix, you could use the following: 要返回矩阵,可以使用以下命令:
mat <- ch2 %>% split(ch2) %>%
map( ~ str_detect(ch1, .x)) %>%
map_call(cbind)
However, since map_call
is just a thin wrapper for do.call
, it may be a bit slow. 但是,由于map_call
只是do.call
的瘦包装,因此它可能会有点慢。 If you could use dplyr
and work with a data.frame
as the result, the following may be a little faster: 如果可以使用dplyr
并使用data.frame
作为结果,则以下操作可能会更快一些:
library(dplyr)
df <- ch2 %>% split(ch2) %>%
map( ~ str_detect(ch1, .x)) %>%
as_data_frame()
Added 添加
The following is a solution which produces a matrix with named columns using map2
以下是使用map2
生成具有命名列的矩阵的解决方案
# solution using map2
mat2 <- ch1 %>% list %>%
map2(ch2, ~ str_detect(.x, .y)) %>%
map_call(cbind)
colnames(mat2) <- ch2
Perhaps the most straightforward which produces a matrix with column names is: 产生具有列名称矩阵的最直接的方法也许是:
names(ch2) <- ch2
mat3 <- ch2 %>% map( ~ str_detect(ch1, .x)) %>%
map_call(cbind)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.