简体   繁体   中英

A pattern matching an expression that doesn't end with specific sequence

I need a regex pattern which matches such strings that DO NOT end with such a sequence: \\.[A-z0-9]{2,} by which I mean the examined string must not have at its end a sequence of a dot and then two or more alphanumeric characters. For example, a string /home/patryk/www and also /home/patryk/www/ should match desired pattern and /home/patryk/images/DSC002.jpg should not. I suppose this has something to do with lookarounds (look aheads) but still I have no idea how to make it. Any help appreciated.

Old Answer

You can use a negative lookbehind at the end if your regex flavor supports it:

^.*+(?<!\.\w{2,})$

This will match a string that has an end anchor not preceded by the icky sequence you don't want.

Note that as m.buettner has pointed out, this uses an indefinite length lookbehind, which is a feature unique to .NET


New Answer

After a bit of digging around, however, I've found that variable length look- aheads are pretty widely supported, so here is a version that uses those:

^(?:(?!\.\w{2,}$).)++$

In a comment on an answer, you have stated you wanted to not match strings with forward slashes at the end, which is accomplished by simply adding a forward slash to the lookahead.

^(?:(?!(\.\w{2,}|/)$).)++$

Note that I am using \\w for succinctness, but it lets underscores through. If this is important, you could replace it with [^\\W_] .

Asad's version is very convenient, but only .NET's regex engine supports variable-length lookbehinds (which is one of the many reasons why every regex question should include the language or tool used).

We can reduce this to a fixed-length lookbehind (which is supported in most engines except for JavaScrpit) if we think about the possible cases which should match. That would be either one or zero letters/digits at the end (whether preceded by . or not) or two or more letters/digits that are not preceded by a dot.

^.*(?:(?<![a-zA-Z0-9])[a-zA-Z0-9]?|(?<![a-zA-Z0-9.])[a-zA-Z0-9]{2,})$

This should do it:

^(?:[^.]+|\.(?![A-Za-z0-9]{2,}$))+$

It alternates between matching one or more of anything except a dot, or a dot if it's not followed by two or more alphanumeric characters and the end of the string.

EDIT: Upgrading it to meet the new requirement is just more of the same:

^(?:[^./]+|/(?=.)|\.(?![A-Za-z0-9]{2,}$))+$

Breaking that down, we have:

  • [^./]+ # one or more of any characters except . or /

  • /(?=.) # a slash, as long as there's at least one character following it

  • \\.(?![A-Za-z0-9]{2,}$) # a dot, unless it's followed by two or more alphanumeric characters followed by the end of the string


On another note: [Az] is an error. It matches all the uppercase and lowercase ASCII letters, but it also matches the characters [ , ] , ^ , _ , backslash and backtick, whose code points happen to lie between Z and a .

很少支持可变长度的后跟,但您不需要一个:

^.*(?<!\.[A-z0-9][A-z0-9]?)$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM