简体   繁体   English

R 中以负回溯开头的正则表达式可选字符

[英]Regex optional character preceded by Negative Lookback in R

Suppose I have a set of strings:假设我有一组字符串:

test <- c('MTB', 'NOT MTB', 'TB', 'NOT TB')

I want to write a regular expression to match either 'TB' or 'MTB' (eg, the expression "M?TB") strictly when this FAILS to be preceeded by the phrase "NOT " (space included).我想编写一个正则表达式来严格匹配 'TB' 或 'MTB'(例如,表达式“M?TB”),当此 FAILS 前面带有短语“NOT”(包括空格)时。

My intended result, therefore, is因此,我的预期结果是

TRUE FALSE TRUE FALSE

So far I have tried a couple of variations of到目前为止,我已经尝试了几种变体

grepl("(?<!NOT )M?TB", test, perl = T)

TRUE TRUE TRUE FALSE

Unsuccessfully.不成功。 As you can see, the phrase 'NOT MTB' meets the criteria for my regular expression.如您所见,短语“NOT MTB”符合我的正则表达式标准。

It seems like including the optional character "M?"似乎包括可选字符“M?” seems to make R think that the negative lookbehind is also optional.似乎让 R 认为负向后视也是可选的。 I have been looking into using parentheses to group the patterns, such as我一直在研究使用括号对模式进行分组,例如

grepl("(?<!NOT )(M?TB)")

TRUE TRUE TRUE FALSE

Which also fails to exclude the phrase 'NOT MTB'.这也未能排除“非山地车”一词。 Admittedly, I am unclear on how parentheses work in regex or eeven what "grouping" means in this context.诚然,我不清楚括号如何在正则表达式中工作,甚至不清楚“分组”在这种情况下的含义。 I have had trouble finding a question related to how to group, require, and "optionalize" different parts of a regex so that I can match a phrase beginning with an optional character and preceeded by a negative lookback .我很难找到与如何对正则表达式的不同部分进行分组、要求和“可选化”相关的问题,以便我可以匹配以可选字符开头并以否定回顾开头的短语。 What is the proper way to write an expression like this?写这样的表达式的正确方法是什么?

We could use the start ( ^ ) and end ( $ ) to match only those words我们可以使用开始( ^ )和结束( $ )来匹配那些词

grepl("^M?TB$", test)
#[1]  TRUE FALSE  TRUE FALSE

If there are other strings as @Wiktor Stribiżew mentioned in the comments, then one option would be如果评论中提到@Wiktor Stribiżew 还有其他字符串,那么一种选择是

test1 <- c(test, "THIS MTB")
!grepl("\\bNOT M?TB\\b", test1) & grepl("\\bM?TB\\b", test1)
#[1]  TRUE FALSE  TRUE FALSE  TRUE
test = c("MTB", "NOT MTB", "TB", "NOT TB", "THIS TB", "THIS NOT TB")

grepl("\\b(?<!NOT\\s)M?TB\\b",test,perl = TRUE)

[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE

There is some question on what the question intends but here is some code to try depending on what is wanted.关于问题的意图存在一些问题,但这里有一些代码可以根据需要尝试。

Added: Poster clarified that #2 and #3 are along the lines looked for.补充:海报澄清 #2 和 #3 是沿着寻找的路线。

1) This can be done without regular expressions like this: 1)这可以在没有像这样的正则表达式的情况下完成:

test %in% c("TB", "MTB")
## [1]  TRUE FALSE  TRUE FALSE

2) If the problem is not about exact matches then return matches to M?TB which do not also match NOT M?TB: 2)如果问题不在于精确匹配,则将匹配返回给 M?TB,而这些匹配不也匹配 NOT M?TB:

grepl("M?TB", test) & !grepl("NOT M?TB",test)
## [1]  TRUE FALSE  TRUE FALSE

3) Another alternative is to replace NOT M?TB with X and then grepl on M?TB: 3)另一种选择是用 X 替换 NOT M?TB,然后在 M?TB 上使用 grepl:

grepl("M?TB", sub("NOT M?TB", "X", test))
## [1]  TRUE FALSE  TRUE FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM