[英]matching certain numbers at the end of a string
I have a vector of strings: 我有一个字符串向量:
s <- c('abc1', 'abc2', 'abc3', 'abc11', 'abc12',
'abcde1', 'abcde2', 'abcde3', 'abcde11', 'abcde12',
'nonsense')
I would like a regular expression to match only the strings that begin with abc
and end with 3
, 11
, or 12
. 我想一个正则表达式匹配只与开始字符串
abc
和结束3
, 11
,或12
。 In other words, the regex has to exclude abc1
but not abc11
, abc2
but not abc12
, and so on. 换句话说,正则表达式必须排除
abc1
而不是abc11
, abc2
而不是abc12
,依此类推。
I thought that this would be easy to do with lookahead assertions, but I haven't found a way. 我认为使用前瞻断言很容易做到,但我找不到办法。 Is there one?
有吗?
EDIT: Thanks to posters below for pointing out a serious ambiguity in the original post. 编辑:感谢下面的海报,指出原帖中的严重歧义。
In reality, I have many strings. 实际上,我有许多字符串。 They all end in digits: some in 0, some in 9, some in the digits in between.
它们都以数字结尾:一些在0中,一些在9中,一些在数字之间。 I am looking for a regex that will match all strings except those that end with a letter followed by a 1 or a 2. (The regex should also match only those strings that start with
abc
, but that's an easy problem.) 我正在寻找一个匹配所有字符串的正则表达式, 除了以字母后跟1或2结尾的字符串。(正则表达式也应该只匹配那些以
abc
开头的字符串,但这很容易出问题。)
I tried to use negative lookahead assertions to create such a regex. 我试图使用负前瞻断言来创建这样的正则表达式。 But I didn't have any success.
但我没有任何成功。
Thanks to all who replied and commented. 感谢所有回复和评论的人。 Inspired by several of you, I ended up using this combination:
grepl('^abc', s) & !grepl('[[:lower:]][12]$', s)
. 受到你们几个人的启发,我最终使用了这个组合:
grepl('^abc', s) & !grepl('[[:lower:]][12]$', s)
。
Is this what you want? 这是你想要的吗?
s[grepl("abc.*(3|11|12)", s)]
[1] "abc3" "abc11" "abc12" "abcde3" "abcde11" "abcde12"
And the excluded strings are: 被排除的字符串是:
s[!grepl("abc.*(3|11|12)", s)]
[1] "abc1" "abc2" "abcde1" "abcde2" "nonsense"
Edit: As the comments indicate, there is some ambiguity in your requirements. 编辑:正如评论所示,您的要求存在一些模糊性。 A more comprehensive regex will test for the string start
^
and string end $
and possibly only allow alphabet characters [[:alpha:]]
before the final digits: 更全面的正则表达式将测试字符串start
^
和string end $
并且可能只允许字母字符[[:alpha:]]
在最终数字之前:
s[grepl("^abc[[:alpha:]]*.*(3|11|12)$", s)]
[1] "abc3" "abc11" "abc12" "abcde3" "abcde11" "abcde12"
You can also get grep
to return the values directly, by passing the argument value=TRUE
, thus saving a bit of duplication in the code: 您还可以通过传递参数
value=TRUE
来获取grep
以直接返回值,从而在代码中保存一些重复:
grep("^abc[[:alpha:]]*.*(3|11|12)$", s, value=TRUE)
[1] "abc3" "abc11" "abc12" "abcde3" "abcde11" "abcde12"
Instead of one complicated regular expression, in this case I think it's easier to use two simple regular expressions: 在这种情况下,我认为使用两个简单的正则表达式更容易,而不是一个复杂的正则表达式:
s <- c('abc1', 'abc2', 'abc3', 'abc11', 'abc12',
'abcde1', 'abcde2', 'abcde3', 'abcde11', 'abcde12',
'nonsense')
s[grepl("^abc", s) & grepl("(3|11|12)$", s)]
You could use substring
in this case too: 在这种情况下你也可以使用
substring
:
z <- nchar(s)
s[substring(s, 1, 3) == "abc" & substring(s, z) == "3" |
substring(s, z-1) %in% c("12", "11")]
Looking specifically for the requested numbers gives this: 专门寻找所需的数字给出了:
n <- c(3,11,12)
s[sub('abc[^[:digit:]]*([[:digit:]]+)$',s, replacement='\\1') %in% n]
[1] "abc3" "abc11" "abc12" "abcde3" "abcde11" "abcde12"
This doesn't confuse 11 for 1: 这不会混淆11为1:
n <- c(3,1,12)
s[sub('abc[^[:digit:]]*([[:digit:]]+)$',s, replacement='\\1') %in% n]
[1] "abc1" "abc3" "abc12" "abcde1" "abcde3" "abcde12"
For your edit, not ending in 1 or 2 (and using two regular expressions) 对于您的编辑,不以1或2结尾(并使用两个正则表达式)
s[grepl('^abc',s) & !(sub('.*[^[:digit:]]([[:digit:]]+)$',s, replacement='\\1') %in% c(1,2))]
[1] "abc3" "abc11" "abc12" "abcde3" "abcde11" "abcde12"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.