简体   繁体   English

匹配字符串末尾的某些数字

[英]matching certain numbers at the end of a string

I have a vector of strings: 我有一个字符串向量:

s <- c('abc1',   'abc2',   'abc3',   'abc11',   'abc12', 
       'abcde1', 'abcde2', 'abcde3', 'abcde11', 'abcde12', 
       'nonsense')

I would like a regular expression to match only the strings that begin with abc and end with 3 , 11 , or 12 . 我想一个正则表达式匹配只与开始字符串abc和结束311 ,或12 In other words, the regex has to exclude abc1 but not abc11 , abc2 but not abc12 , and so on. 换句话说,正则表达式必须排除abc1而不是abc11abc2而不是abc12 ,依此类推。

I thought that this would be easy to do with lookahead assertions, but I haven't found a way. 我认为使用前瞻断言很容易做到,但我找不到办法。 Is there one? 有吗?


EDIT: Thanks to posters below for pointing out a serious ambiguity in the original post. 编辑:感谢下面的海报,指出原帖中的严重歧义。

In reality, I have many strings. 实际上,我有许多字符串。 They all end in digits: some in 0, some in 9, some in the digits in between. 它们都以数字结尾:一些在0中,一些在9中,一些在数字之间。 I am looking for a regex that will match all strings except those that end with a letter followed by a 1 or a 2. (The regex should also match only those strings that start with abc , but that's an easy problem.) 我正在寻找一个匹配所有字符串的正则表达式, 除了以字母后跟1或2结尾的字符串。(正则表达式也应该只匹配那些以abc开头的字符串,但这很容易出问题。)

I tried to use negative lookahead assertions to create such a regex. 我试图使用负前瞻断言来创建这样的正则表达式。 But I didn't have any success. 但我没有任何成功。


Thanks to all who replied and commented. 感谢所有回复和评论的人。 Inspired by several of you, I ended up using this combination: grepl('^abc', s) & !grepl('[[:lower:]][12]$', s) . 受到你们几个人的启发,我最终使用了这个组合: grepl('^abc', s) & !grepl('[[:lower:]][12]$', s)

Is this what you want? 这是你想要的吗?

s[grepl("abc.*(3|11|12)", s)]
[1] "abc3"    "abc11"   "abc12"   "abcde3"  "abcde11" "abcde12"

And the excluded strings are: 被排除的字符串是:

s[!grepl("abc.*(3|11|12)", s)]
[1] "abc1"     "abc2"     "abcde1"   "abcde2"   "nonsense"

Edit: As the comments indicate, there is some ambiguity in your requirements. 编辑:正如评论所示,您的要求存在一些模糊性。 A more comprehensive regex will test for the string start ^ and string end $ and possibly only allow alphabet characters [[:alpha:]] before the final digits: 更全面的正则表达式将测试字符串start ^和string end $并且可能只允许字母字符[[:alpha:]]在最终数字之前:

s[grepl("^abc[[:alpha:]]*.*(3|11|12)$", s)]
[1] "abc3"    "abc11"   "abc12"   "abcde3"  "abcde11" "abcde12"

You can also get grep to return the values directly, by passing the argument value=TRUE , thus saving a bit of duplication in the code: 您还可以通过传递参数value=TRUE来获取grep以直接返回值,从而在代码中保存一些重复:

grep("^abc[[:alpha:]]*.*(3|11|12)$", s, value=TRUE)
[1] "abc3"    "abc11"   "abc12"   "abcde3"  "abcde11" "abcde12"

Instead of one complicated regular expression, in this case I think it's easier to use two simple regular expressions: 在这种情况下,我认为使用两个简单的正则表达式更容易,而不是一个复杂的正则表达式:

s <- c('abc1',   'abc2',   'abc3',   'abc11',   'abc12', 
       'abcde1', 'abcde2', 'abcde3', 'abcde11', 'abcde12', 
       'nonsense')

s[grepl("^abc", s) & grepl("(3|11|12)$", s)]

You could use substring in this case too: 在这种情况下你也可以使用substring

z <- nchar(s)
s[substring(s, 1, 3) == "abc" & substring(s, z) == "3" | 
    substring(s, z-1) %in%  c("12", "11")] 

Looking specifically for the requested numbers gives this: 专门寻找所需的数字给出了:

n <-  c(3,11,12)

s[sub('abc[^[:digit:]]*([[:digit:]]+)$',s, replacement='\\1') %in% n]
 [1] "abc3"    "abc11"   "abc12"   "abcde3"  "abcde11" "abcde12"

This doesn't confuse 11 for 1: 这不会混淆11为1:

 n <-  c(3,1,12)

s[sub('abc[^[:digit:]]*([[:digit:]]+)$',s, replacement='\\1') %in% n]
 [1] "abc1"    "abc3"    "abc12"   "abcde1"  "abcde3"  "abcde12"

For your edit, not ending in 1 or 2 (and using two regular expressions) 对于您的编辑,不以1或2结尾(并使用两个正则表达式)

s[grepl('^abc',s) & !(sub('.*[^[:digit:]]([[:digit:]]+)$',s, replacement='\\1') %in% c(1,2))]
[1] "abc3"    "abc11"   "abc12"   "abcde3"  "abcde11" "abcde12"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM