简体   繁体   中英

Regex matching a pattern that doesn't include another pattern

I understand the process for a regex that only contains numbers but how would i add another condition to that such that it cannot contain a certain substring. For example, a regex that match input that contains only numbers, but not the substring 456 .

Given this input (where <empty> is the empty string "" ):

0
1456
<empty>
12345689
1010101
abc

These and only these should matche:

0
<empty>
1010101

Could somebody explain the regex for this?

You can use this regex using a negative lookahead:

^(?![0-9]*456)[0-9]*$

RegEx Demo

  • (?![0-9a-zA-Z]*456) is negative lookahead to disallow 456 in the word.

I think this is what you are looking for:

public static void main(String[] args) {
    String regex = "^((?!456)\\d)*$";
    String test = "123";
    String test2 = "456";
    String test3 = "asdf123";
    String test4 = "test456asdf";

    System.out.println(test.matches(regex)); // True
    System.out.println(test2.matches(regex)); // False
    System.out.println(test3.matches(regex)); // False
    System.out.println(test4.matches(regex)); // False
}

That is:

  • start of string
  • zero or more times
    • look at the three chars starting here, don't match if it's "456"
    • match one digit
  • end of string

Here's a link to fiddle where you can test the epsilon character as well.

使用锚定的否定前瞻开始,并匹配“数字”:

^(?!.*456)\p{N}*$

I think this works without any "fancy" regex features such as negative lookahead.

^([0-35-9]*|4[0-46-9]|45[0-57-9]|4$|45$)*$

That is:

  • start
    • any number of:
      • a sequence of digits not including 4
      • or a 2 char number starting with "4", but not "45"
      • or a 3 char number starting with "45", but not "456"
      • or a 4 followed by end
      • or a 45 followed by end
  • end

This is in keeping with regex's property of being a finite state machine. We have explicitly dealt with three states - ("Not seen a 4", "Seen a 4", "Seen a 45"). If we wanted our 'not matching' string to be "4567" we'd have to explicitly add another state, making the pattern longer and the state machine bigger.

Whether this meets your needs depends on what the test is looking for -- familiarity with advanced features of Java's regex dialect, or ability to apply regular expressions universally (eg basic grep , bash ).

Negative lookaheads, allow you to express this more tersely.

^((!?456)\d)*$

That is (with start and end anchors around it), zero or more repetitions of a one-char pattern: (!?456)\\d which means "Not the start of 456 (looking ahead without actually consuming) and matches a numeric character."

To process this, the regex engine only ever needs to look 3 chars ahead of the current character, making this an efficient one-pass way of meeting the requirement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM