简体   繁体   English

使用 R 从以表达式结尾的句子中提取所有单词

[英]Extract all words from a sentence ending in an expression using R

suppose I have the next string:假设我有下一个字符串:

"palavras a serem encontradas fazer-se encontrar-se, enganar-se" "palavras a serem encontradas fazer-se encontrar-se, enganar-se"

How can I extract the words "fazer-se" "encontrar-se" "enganar-se"我如何提取单词“fazer-se”“encontrar-se”“enganar-se”

I'm try o use stringr like我正在尝试使用 stringr 之类的

library(stringr)    
sentence <- "palavras a serem encontradas fazer-se encontrar-se, enganar-se"
str_extract_all(sentence, "se$")

I'd like this output:我想要这个输出:

[1] "fazer-se" "encontrar-se" "enganar-se"

We can specify the word boundary ( \\\\b ) and not the end ( $ ) of the string (there is only one match for that, ie at the end of the string) and we need to get the characters that are not a whitespace before the se substring, so use \\\\S+ ie one or more non-whitespace characters我们可以指定单词边界( \\\\b )而不是字符串的结尾( $ )(只有一个匹配,即在字符串的末尾),我们需要获取不是空格的字符在se子字符串之前,所以使用\\\\S+即一个或多个非空白字符

library(stringr)
str_extract_all(sentence, "\\S+se\\b")[[1]]
#[1] "fazer-se"     "encontrar-se" "enganar-se"  

In base R, we can use gregexpr and regmatches :在基础 R 中,我们可以使用gregexprregmatches

regmatches(sentence, gregexpr('\\w+-se', sentence))[[1]]
#[1] "fazer-se"     "encontrar-se" "enganar-se"  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM