R：确定两个不同数据帧的两个文本字符串之间的第一，第二，第三，第四匹配

Question

Is there any R package in order to identify the position (rowindex) of the 1st, 2nd, 3rd, 4th match between two text string columns of two different dataframes? 是否有任何R包来标识两个不同数据帧的两个文本字符串列之间的第一，第二，第三，第四匹配的位置（行索引）？

For instance: 例如：

I have the following dataframe: 我有以下数据框：

dataframe: simpletext

row text
1   does he go to that bar or for shopping?
2   where was that bar that I wanted?
3   I would like to go to the opera instead for shopping


dataframe: keywords

row  word
1    shopping
2    opera
3    bar

What I want is to find that the first match of simpletext$text[1] is keywords$word[3] 我想要找到的是simpletext $ text [1]的第一个匹配项是关键字$ word [3]

the second match of simpletext$text[1] is keywords$word[1] and so on for every row or simpletext simpletext $ text [1]的第二个匹配项是关键字$ word [1]，依此类推，对于每一行或simpletext

Answer 1

You might start with something like this: 您可能会从以下内容开始：

library(tidyverse)
find_locations <- function(word, text) {
  bind_cols(
    data_frame(
      word = word,
      text = text
    ),
    as_data_frame(str_locate(text, word))
  )
}

map_df(keywords$word, find_locations, text = simpletext$text)

Answer 2

You can use regexpr ( grep family) function: 您可以使用regexpr （ grep系列）功能：

keywords = rbind("shopping","opera","bar")
simpletext = rbind("does he go to that bar or for shopping?",
                   "where was that bar that I wanted?",
                   "I would like to go to the opera instead for shopping")

text_match <- function(text,keywords)
{
  # check all keywords for matching
  matches <- vapply(keywords[1:length(keywords)], function(x) regexpr(x,text)[1], FUN.VALUE=1) 
  # sort matched keywords in order of appearance
  sorted_matches <- names(sort(matches[matches>0])) 
  # return indices of sorted matches
  indices <- vapply(sorted_matches, function(x) which(keywords == x),FUN.VALUE=1) 
  return (indices)
}

where regexpr(x,text)[1] returns the position of the first match of x in text or -1 if there is none. 其中regexpr(x,text)[1]返回x在text中的第一个匹配项的位置，如果没有则返回-1 。

The result is as follows: 结果如下：

text_match(simpletext[1],keywords)
#bar shopping 
#3        1 
text_match(simpletext[2],keywords)
# bar 
# 3
text_match(simpletext[3],keywords)
# opera shopping 
# 2        1

R：确定两个不同数据帧的两个文本字符串之间的第一，第二，第三，第四匹配

问题描述

2 个解决方案

解决方案1
0 2018-04-22 18:07:42

解决方案2
0 已采纳 2018-04-22 18:51:41

R：确定两个不同数据帧的两个文本字符串之间的第一，第二，第三，第四匹配

问题描述

2 个解决方案

解决方案1 0 2018-04-22 18:07:42

解决方案2 0 已采纳 2018-04-22 18:51:41

解决方案1
0 2018-04-22 18:07:42

解决方案2
0 已采纳 2018-04-22 18:51:41