简体   繁体   中英

regex to match specific pattern of string followed by digits

Sample input:

___file___name___2000___ed2___1___2___3
DIFFERENT+FILENAME+(2000)+1+2+3+ed10

Desired output (eg, all letters and 4-digit numbers and literal 'ed' followed immediately by a digit of arbitrary length:

file name 2000 ed2
DIFFERENT FILENAME 2000 ed10

I am using: [A-Za-z]+|[\d]{4}|ed\d+ which only returns: file name 2000 ed DIFFERENT FILENAME 2000 ed

I see that there is a related Q+A here: Regular Expression to match specific string followed by number?

eg using ed[0-9]* would match ed# , but unsure why it does not match in the above.

As written, your regex is correct. Remember, however, that regex tries to match its statements from left to right. Your ed\d+ is never going to match, because the ed was already consumed by your [A-Za-z] alternative. Reorder your regex and it'll work just fine:

ed\d+|[a-zA-Z]+|\d{4}

Demo

Nick's answer is right, but because in-order matching can be a less readable "gotcha", the best (order-insensitive) ways to do this kind of search are 1) with specified delimiters, and 2) by making each search term unique.

Jan's answer handles #1 well. But you would have to specify each specific delimiter, including its length (eg ___ ). It sounds like you may have some unusual delimiters, so this may not be ideal.

For #2, then, you can make each search term unique. (That is, you want the thing matching "file" and "name" to be distinct from the thing matching "2000", and both to be distinct from the thing matching "ed2".)

One way to do this is [A-Za-z]+(?![0-9a-zA-Z])|[\d]{4}|ed\d+ . This is saying that for the first type of search term, you want an alphabet string which is followed by a non-alphanumeric character. This keeps it distinct from the third search term, which is an alphabet string followed by some digit(s). This also allows you to specify any range of delimiters inside of that negative lookbehind .

demo

You might very well use (just grab the first capturing group):

(?:^|___|[+(])    # delimiter before
([a-zA-Z0-9]{2,}) # the actual content
(?=$|___|[+)])    # delimiter afterwards

See a demo on regex101.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM