简体   繁体   中英

Regex to optionally match 3 digits at the end of file name

I can't for the life of me figure out how to get these to match:

File name without 3 digit end.jpg
File name with 3 digit 123.gif
Single 123.jpg
Single.png

But not these:

Single 1.jpg
Single 123b.gif
More words 123b.png

The best I could so so far is this expression:

^[^\s]((?!\s{2})(?!,\S).)*\b(\p{L}+|\d{3})\.\w{3}$

But it fails to match Single.png and still matches Single 123b.gif and More words 123b.png . I think I understand why it doesn't work but I can't figure out how to get it right, and I have been struggling and Googling for 2 days.

My full rules are: optionally exactly 3 digits at the end before the file extension, 3 letter file extension, no double spaces in the file name and a single space after but not before a comma.

You can use an alternation group that includes either 3 digits or a sequence of non-digits, preceded by a word boundary assertion:

^.*?\b(?:\d{3}|\D+)\.\w{3}$

Demo: https://regex101.com/r/A9iSVE/3

To take your requiremenet into account about the comma and the double spaces, one option could be to use 2 negative lookaheads to assert that the string does not contain a double space and does not contain a space before the comma.

You could use \\s if you want to match a whitespace character instead of a single space.

^(?!.*[ ]{2})(?!.* ,).*\b(?:\p{L}+|\d{3})\.\w{3}$

That will match

  • ^ Start of the string
  • (?!.*[ ]{2}) Assert not 2 spaces
  • (?!.* ,) Assert not a single space and a comma
  • .*\\b Match any char 0+ times followed by a word boundary
  • (?:\\p{L}+|\\d{3}) Either match 1+ times a letter or 3 digits
  • \\.\\w{3} Match . and 3 word chars
  • $ End of string

Regex demo | C# demo

You can meet the specified rules without backtracking (which the currently accepted answer does). The rules specified are (rephrased for clarity): A filename MUST meet the following conditions:

  • It MUST NOT contain a sequence of multiple space characters.
  • A comma MUST have exactly one space character following it.
  • The filename stem MAY have a 3-digit suffix.
  • The filename extension MUST consist of 3 letters.

To that end:

^(?<prefix>[^, ]+(,? [^, ]+)*)(?<suffix>\d\d\d)?(?<extension>.\p{L}\p{L}\p{L})$

will do the trick, no fancy lookahead, no backtracking. Broken out into its pieces, you get:

^                  # * match start-of-text, followed by
(?<prefix>         # * a named group, consisting of
  [^,\x20]+        #   * 1 or more characters other than comma or space, followed by
  (                #   * a group, consisting of
    ,?             #     * an optional comma, followed by
    \x20           #     * a single space character, followed by
    [^,\x20]+      #     * 1 or more characters other than comma or space
  )*               #     with the whole group repeated zero or more times
)                  #   followed by
(?<suffix>         # * an optional named group (the suffix), consisting of
  \d\d\d           #   * 3 decimal digits
)?                 #   followed by
(?<extension>      # * a mandatory named group (the filename extension), consisting of
  .\p{L}\p{L}\p{L} #   * 3 letters.
)                  #   followed by
$                  # end-of-text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM