简体   繁体   中英

Need a regex to exclude certain strings

I'm trying to get a regex that will match:

somefile_1.txt
somefile_2.txt
somefile_{anything}.txt

but not match:

somefile_16.txt

I tried

somefile_[^(16)].txt

with no luck (it includes even the "16" record)

Some regex libraries allow lookahead:

somefile(?!16\.txt$).*?\.txt

Otherwise, you can still use multiple character classes:

somefile([^1].|1[^6]|.|.{3,})\.txt

or, to achieve maximum portability:

somefile([^1].|1[^6]|.|....*)\.txt

[^(16)] means: Match any character but braces, 1, and 6.

The best solution has already been mentioned:

somefile_(?!16\.txt$).*\.txt

This works, and is greedy enough to take anything coming at it on the same line. If you know, however, that you want a valid file name, I'd suggest also limiting invalid characters:

somefile_(?!16)[^?%*:|"<>]*\.txt

If you're working with a regex engine that does not support lookahead, you'll have to consider how to make up that !16. You can split files into two groups, those that start with 1, and aren't followed by 6, and those that start with anything else:

somefile_(1[^6]|[^1]).*\.txt

If you want to allow somefile_16_stuff.txt but NOT somefile_16.txt, these regexes above are not enough. You'll need to set your limit differently:

somefile_(16.|1[^6]|[^1]).*\.txt

Combine this all, and you end up with two possibilities, one which blocks out the single instance (somefile_16.txt), and one which blocks out all families (somefile_16*.txt). I personally think you prefer the first one:

somefile_((16[^?%*:|"<>]|1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt
somefile_((1[^6?%*:|"<>]|[^1?%*:|"<>])[^?%*:|"<>]*|1)\.txt

In the version without removing special characters so it's easier to read:

somefile_((16.|1[^6]|[^1).*|1)\.txt
somefile_((1[^6]|[^1]).*|1)\.txt

To obey strictly to your specification and be picky, you should rather use:

^somefile_(?!16\.txt$).*\.txt$

so that somefile_1666.txt which is {anything} can be matched ;)

but sometimes it is just more readable to use...:

ls | grep -e 'somefile_.*\.txt' | grep -v -e 'somefile_16\.txt'
somefile_(?!16).*\.txt

(?!16)表示:断言从该位置开始不可能匹配正则表达式“16”。

Sometimes it's just easier to use two regular expressions. First look for everything you want, then ignore everything you don't. I do this all the time on the command line where I pipe a regex that gets a superset into another regex that ignores stuff I don't want.

If the goal is to get the job done rather than find the perfect regex, consider that approach. It's often much easier to write and understand than a regex that makes use of exotic features.

Without using lookahead

somefile_(|.|[^1].+|10|11|12|13|14|15|17|18|19|.{3,}).txt

Read it like: somefile_ followed by either:

  1. nothing.
  2. one character.
  3. any one character except 1 and followed by any other characters.
  4. three or more characters.
  5. either 10 .. 19 note that 16 has been left out.

and finally followed by .txt .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM