使用 R 从以表达式结尾的句子中提取所有单词

Question

suppose I have the next string:假设我有下一个字符串：

"palavras a serem encontradas fazer-se encontrar-se, enganar-se" "palavras a serem encontradas fazer-se encontrar-se, enganar-se"

How can I extract the words "fazer-se" "encontrar-se" "enganar-se"我如何提取单词“fazer-se”“encontrar-se”“enganar-se”

I'm try o use stringr like我正在尝试使用 stringr 之类的

library(stringr)    
sentence <- "palavras a serem encontradas fazer-se encontrar-se, enganar-se"
str_extract_all(sentence, "se$")

I'd like this output:我想要这个输出：

[1] "fazer-se" "encontrar-se" "enganar-se"

Answer 1

We can specify the word boundary ( \\\\b ) and not the end ( $ ) of the string (there is only one match for that, ie at the end of the string) and we need to get the characters that are not a whitespace before the se substring, so use \\\\S+ ie one or more non-whitespace characters我们可以指定单词边界（ \\\\b ）而不是字符串的结尾（ $ ）（只有一个匹配，即在字符串的末尾），我们需要获取不是空格的字符在se子字符串之前，所以使用\\\\S+即一个或多个非空白字符

library(stringr)
str_extract_all(sentence, "\\S+se\\b")[[1]]
#[1] "fazer-se"     "encontrar-se" "enganar-se"

Answer 2

In base R, we can use gregexpr and regmatches :在基础 R 中，我们可以使用gregexpr和regmatches ：

regmatches(sentence, gregexpr('\\w+-se', sentence))[[1]]
#[1] "fazer-se"     "encontrar-se" "enganar-se"

使用 R 从以表达式结尾的句子中提取所有单词

问题描述

2 个解决方案

解决方案1
2 2020-09-18 22:01:39

解决方案2
0 2020-09-19 02:42:53

使用 R 从以表达式结尾的句子中提取所有单词

问题描述

2 个解决方案

解决方案1 2 2020-09-18 22:01:39

解决方案2 0 2020-09-19 02:42:53

解决方案1
2 2020-09-18 22:01:39

解决方案2
0 2020-09-19 02:42:53