简体   繁体   English

正则表达式匹配字符串的特定模式,后跟数字

[英]regex to match specific pattern of string followed by digits

Sample input:样本输入:

___file___name___2000___ed2___1___2___3
DIFFERENT+FILENAME+(2000)+1+2+3+ed10

Desired output (eg, all letters and 4-digit numbers and literal 'ed' followed immediately by a digit of arbitrary length:所需的 output (例如,所有字母和 4 位数字和文字 'ed' 紧跟任意长度的数字:

file name 2000 ed2
DIFFERENT FILENAME 2000 ed10

I am using: [A-Za-z]+|[\d]{4}|ed\d+ which only returns: file name 2000 ed DIFFERENT FILENAME 2000 ed我正在使用: [A-Za-z]+|[\d]{4}|ed\d+仅返回: file name 2000 ed DIFFERENT FILENAME 2000 ed

I see that there is a related Q+A here: Regular Expression to match specific string followed by number?我看到这里有一个相关的问答: 正则表达式匹配特定字符串后跟数字?

eg using ed[0-9]* would match ed# , but unsure why it does not match in the above.例如,使用ed[0-9]*将匹配ed# ,但不确定为什么它与上述不匹配。

As written, your regex is correct.如所写,您的正则表达式是正确的。 Remember, however, that regex tries to match its statements from left to right.但是请记住,正则表达式会尝试从左到右匹配其语句。 Your ed\d+ is never going to match, because the ed was already consumed by your [A-Za-z] alternative.你的ed\d+永远不会匹配,因为 ed 已经被你的[A-Za-z]替代品消耗了。 Reorder your regex and it'll work just fine:重新排序您的正则表达式,它会工作得很好:

ed\d+|[a-zA-Z]+|\d{4}

Demo演示

Nick's answer is right, but because in-order matching can be a less readable "gotcha", the best (order-insensitive) ways to do this kind of search are 1) with specified delimiters, and 2) by making each search term unique.尼克的回答是正确的,但是因为按顺序匹配可能是一个可读性较差的“陷阱”,所以进行这种搜索的最佳(顺序不敏感)方法是 1)使用指定的分隔符,以及 2)通过使每个搜索词唯一.

Jan's answer handles #1 well. Jan 的回答很好地处理了#1。 But you would have to specify each specific delimiter, including its length (eg ___ ).但是您必须指定每个特定的分隔符,包括它的长度(例如___ )。 It sounds like you may have some unusual delimiters, so this may not be ideal.听起来您可能有一些不寻常的分隔符,所以这可能并不理想。

For #2, then, you can make each search term unique.那么,对于#2,您可以使每个搜索词都独一无二。 (That is, you want the thing matching "file" and "name" to be distinct from the thing matching "2000", and both to be distinct from the thing matching "ed2".) (也就是说,您希望匹配“文件”和“名称”的事物与匹配“2000”的事物不同,并且都与匹配“ed2”的事物不同。)

One way to do this is [A-Za-z]+(?![0-9a-zA-Z])|[\d]{4}|ed\d+ .一种方法是[A-Za-z]+(?![0-9a-zA-Z])|[\d]{4}|ed\d+ This is saying that for the first type of search term, you want an alphabet string which is followed by a non-alphanumeric character.这就是说,对于第一种类型的搜索词,您需要一个字母字符串,后跟一个非字母数字字符。 This keeps it distinct from the third search term, which is an alphabet string followed by some digit(s).这使它与第三个搜索词不同,第三个搜索词是一个字母字符串,后跟一些数字。 This also allows you to specify any range of delimiters inside of that negative lookbehind .这还允许您在该否定的lookbehind内指定任何范围的分隔符

demo演示

You might very well use (just grab the first capturing group):您可能会很好地使用(只需抓住第一个捕获组):

(?:^|___|[+(])    # delimiter before
([a-zA-Z0-9]{2,}) # the actual content
(?=$|___|[+)])    # delimiter afterwards

See a demo on regex101.com在 regex101.com 上查看演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM