[英]R-regex: match strings not beginning with a pattern
I'd like to use regex to see if a string does not begin with a certain pattern.我想使用正则表达式来查看字符串是否以某种模式开头。 While I can use:
[^
to blacklist certain characters, I can't figure out how to blacklist a pattern.虽然我可以使用:
[^
将某些字符列入黑名单,但我不知道如何将模式列入黑名单。
> grepl("^[^abc].+$", "foo")
[1] TRUE
> grepl("^[^abc].+$", "afoo")
[1] FALSE
I'd like to do something like grepl("^[^(abc)].+$", "afoo")
and get TRUE
, ie to match if the string does not start with abc
sequence.我想做一些类似
grepl("^[^(abc)].+$", "afoo")
并获得TRUE
,即匹配字符串是否以abc
序列开头。
Note that I'm aware of this post , and I also tried using perl = TRUE
, but with no success:请注意,我知道这篇文章,我也尝试使用
perl = TRUE
,但没有成功:
> grepl("^((?!hede).)*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^((?!hede).)*$", "foohede", perl = TRUE)
[1] FALSE
Any ideas?有任何想法吗?
Yeah.是的。 Put the zero width lookahead /outside/ the other parens.
将零宽度前瞻/外部/其他括号。 That should give you this:
那应该给你这个:
> grepl("^(?!hede).*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^(?!hede).*$", "foohede", perl = TRUE)
[1] TRUE
which I think is what you want.我认为这是你想要的。
Alternately if you want to capture the entire string, ^(?!hede)(.*)$
and ^((?!hede).*)$
are both equivalent and acceptable.或者,如果您想捕获整个字符串,
^(?!hede)(.*)$
和^((?!hede).*)$
都是等效的并且可以接受。
I got stuck on the following special case, so I thought I would share...我陷入了以下特殊情况,所以我想我会分享......
Apparently you can turn off the implicit greediness of the search with specific perl wildcard modifiers显然,您可以使用特定的perl 通配符修饰符关闭搜索的隐式贪婪
Suppose the string I wanted to process was假设我要处理的字符串是
myExampleString = paste0(c(letters[1:13], "_", letters[14:26], "__",
LETTERS[1:13], "_", LETTERS[14:26], "__",
"laksjdl", "_", "lakdjlfalsjdf"),
collapse = "")
myExampleString
"abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ__laksjdl_lakdjlfalsjd"
"abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ__laksjdl_lakdjlfalsjd"
and that I wanted only the first segment before the first "__"
.并且我只想要第一个
"__"
之前的第一段。 I cannot simply search on "_"
, because single-underscore is an allowable non-delimiter in this example string.我不能简单地搜索
"_"
,因为在此示例字符串中单下划线是允许的非分隔符。
The following doesn't work.以下不起作用。 It instead gives me the first and second segments because of the default greediness (but not third, because of the forward-look).
因为默认的贪婪,它反而给了我第一和第二段(但不是第三段,因为前瞻性)。
gsub("^(.+(?=__)).*$", "\\1", myExampleString, perl = TRUE)
"abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ"
"abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ"
But this does work但这确实有效
gsub("^(.+?(?=__)).*$", "\\1", myExampleString, perl = TRUE)
"abcdefghijklm_nopqrstuvwxyz"
“abcdefghijklm_nopqrstuvwxyz”
The difference is the greedy-modifier "?"
区别在于贪婪修饰符
"?"
after the wildcard ".+"
in the (perl) regular expression.在 (perl) 正则表达式中的通配符
".+"
之后。
There is now (years later) another possibility with the stringr
package.现在(多年后)
stringr
包有另一种可能性。
library(stringr)
str_detect("dsadsf", "^abc", negate = TRUE)
#> [1] TRUE
str_detect("abcff", "^abc", negate = TRUE)
#> [1] FALSE
Created on 2020-01-13 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 1 月 13 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.