简体   繁体   English

在 R 中使用 grepl 完成单词匹配

[英]Complete word matching using grepl in R

Consider the following example:考虑以下示例:

> testLines <- c("I don't want to match this","This is what I want to match")
> grepl('is',testLines)
> [1] TRUE TRUE

What I want, though, is to only match 'is' when it stands alone as a single word.不过,我想要的是仅当它作为一个单词单独存在时才匹配“is”。 From reading a bit of perl documentation, it seemed that the way to do this is with \b, an anchor that can be used to identify what comes before and after the patter, ie \bword\b matches 'word' but not 'sword'.从阅读一些 perl 文档来看,似乎这样做的方法是使用 \b,一个可用于识别模式前后的锚点,即 \bword\b 匹配 'word' 但不匹配 'sword '。 So I tried the following example, with use of Perl syntax set to 'TRUE':所以我尝试了以下示例,使用 Perl 语法设置为“TRUE”:

> grepl('\bis\b',testLines,perl=TRUE)
> [1] FALSE FALSE

The output I'm looking for is FALSE TRUE .我正在寻找的 output 是FALSE TRUE

"\<" is another escape sequence for the beginning of a word, and "\>" is the end. “\<”是单词开头的另一个转义序列,“\>”是结尾。 In R strings you need to double the backslashes, so:在 R 字符串中,您需要将反斜杠加倍,因此:

> grepl("\\<is\\>", c("this", "who is it?", "is it?", "it is!", "iso"))
[1] FALSE  TRUE  TRUE  TRUE FALSE

Note that this matches "is."请注意,这匹配“是”。 but not "iso".但不是“iso”。

you need double-escaping to pass escape to regex:您需要双重转义才能将转义传递给正则表达式:

> grepl("\\bis\\b",testLines)
[1] FALSE  TRUE

Very simplistically, match on a leading space:非常简单,匹配前导空格:

testLines <- c("I don't want to match this","This is what I want to match")
grepl(' is',testLines)
[1] FALSE  TRUE

There's a whole lot more than this to regular expressions, but essentially the pattern needs to be more specific.正则表达式远不止这些,但本质上,模式需要更具体。 What you will need in more general cases is a huge topic.在更一般的情况下,您需要的是一个巨大的话题。 See?regex见?正则表达式

Other possibilities that will work for this example:适用于此示例的其他可能性:

grepl(' is ',testLines)
[1] FALSE  TRUE
grepl('\\sis',testLines)
[1] FALSE  TRUE
grepl('\\sis\\s',testLines)
[1] FALSE  TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM