简体   繁体   English

将函数应用于2个字符向量的返回列表-Purrr

[英]Apply function over 2 character vectors return list - purrr

I'm doing some work using purrr and hoping for a complete piped solution for this problem. 我正在使用purrr做一些工作,并希望针对此问题寻求完整的管道解决方案。 I am using sapply but think this isn't the optimal solution. 我正在使用sapply但认为这不是最佳解决方案。 It works for this small demo, but in the real data ch1 is length >50,000 and ch2 is >100. 它适用于此小型演示,但实际数据中ch1的长度大于50,000,而ch2的大于100。

library(stringr)
library(purrr)

ch1 <- c("something very interesting or perhaps it is not", "lions, tigers and elephants are safari animals", "once upon a time there was a big castle",
         "I have not seen anything as a big as elephants")

ch2 <- c("big", "not")

For each element of ch2 we want to see if they occur in each element of ch1 . 对于ch2每个元素,我们想查看它们是否出现在ch1每个元素中。

str_detect(ch1, ch2[1]) # FALSE FALSE  TRUE  TRUE
str_detect(ch1, ch2[2]) # TRUE FALSE FALSE  TRUE

Trying to use purrr to apply function over all of ch1 : 尝试使用purrr将功能应用于所有ch1

ch1 %>% map_lgl(str_detect(., ch2[2]))  # TRUE FALSE FALSE  TRUE

I can do this for the entirety of ch2 using sapply : 我可以使用sapply在整个ch2做到这sapply

sapply(ch2, function(x) ch1 %>% map_lgl(str_detect(., x)))

      big   not
[1,] FALSE  TRUE
[2,] FALSE FALSE
[3,]  TRUE FALSE
[4,]  TRUE  TRUE

However, with the real dataset I think there must be a full purrr solution - something like using map2 ie working on two lists - but obviously it can't be that particular one as it requires lists of equal lengths. 但是,对于真实的数据集,我认为必须有一个完整的purrr解决方案-类似于使用map2即在两个列表上工作-但显然它不是那个特定的,因为它需要等长列表。

The following, which would probably be a bit faster on large data sets than the code in your post, returns a list of vectors. 下面的命令在大型数据集上可能比帖子中的代码要快一些,它返回向量列表。

library(stringr)
library(purrr
lst <- ch2 %>% split(ch2) %>% 
  map( ~ str_detect(ch1, .x))

To return a matrix, you could use the following: 要返回矩阵,可以使用以下命令:

mat <- ch2 %>% split(ch2) %>% 
      map( ~ str_detect(ch1, .x)) %>%
      map_call(cbind)

However, since map_call is just a thin wrapper for do.call , it may be a bit slow. 但是,由于map_call只是do.call的瘦包装,因此它可能会有点慢。 If you could use dplyr and work with a data.frame as the result, the following may be a little faster: 如果可以使用dplyr并使用data.frame作为结果,则以下操作可能会更快一些:

library(dplyr)
df <- ch2 %>% split(ch2) %>% 
      map( ~ str_detect(ch1, .x)) %>%
      as_data_frame() 

Added 添加

The following is a solution which produces a matrix with named columns using map2 以下是使用map2生成具有命名列的矩阵的解决方案

# solution using map2
mat2 <- ch1 %>% list %>%
        map2(ch2,  ~ str_detect(.x, .y)) %>%
        map_call(cbind)
colnames(mat2) <- ch2

Perhaps the most straightforward which produces a matrix with column names is: 产生具有列名称矩阵的最直接的方法也许是:

names(ch2) <- ch2
mat3 <- ch2 %>% map( ~ str_detect(ch1, .x)) %>% 
        map_call(cbind)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM