简体   繁体   中英

Java Reg Exp for Word not followed by another word

Basically, I am writing a program in java where I have to categorize a String in any of three buckets.

  • Category 1 - String with both 'AND' and 'AND NOT'
  • Category 2 - String with 'AND NOT'
  • Category 3 - String with 'AND'

I need some regex to match string having AND followed by NOT if not skip.

A AND B AND NOT C - Fail
A AND B AND C - Fail
A AND NOT B AND NOT C - Pass

Below is sample code snippet

public static void main(String[] args) {
    String X = "A AND B AND C AND D AND NOT E";
    String Y = "A AND NOT C ";
    String Z = "A AND B AND D";
    ArrayList<String> sampleString=new ArrayList<String>(Arrays.asList(X,Y,Z));

    //Category 1 - String with both 'AND' and 'AND NOT'
    //Category 2 - String with 'AND NOT' only
    //Category 3 - String with 'AND' only

    for(String s:sampleString){
        if(s.contains("AND") && s.contains("NOT")){
            System.out.println("Category 1 -"+s);
        }
        // This condition is invalid - I need some regex to match this condition. I need to consider only AND followed by NOT if not skip

        if(s.contains("AND NOT") && !s.contains("AND")){
            System.out.println("Category 2 - "+s);
        }
        if(s.contains("AND") && !s.contains("NOT")){
            System.out.println("Category 3 - "+s);
        }
    }

OUTPUT -

Category 1 -A AND B AND C AND D AND NOT E
Category 1 -A AND NOT C 
Category 3 - A AND B AND D

I tried some regex questions but doesn't resolve mine. I tried with below

String regex="AND(?!\\s+NOT)";

public static void main(String args[]){
        String x= "A AND B AND C AND NOT D"; 
        String regex="AND(?!\\s+NOT)";
        if(Pattern.compile(regex).matcher(x).find()){
            System.out.println("X MATCHED");
        }
    } 
//Returns - X MATCHED

Any help would be much appreciated!

The following regex find() loop will determine the category, returning 0 if the input didn't match any of the listed categories.

private static int categorize(String input) {
    Matcher m = Pattern.compile("(?i)\\bAND(\\s+NOT)?\\b").matcher(input);
    boolean foundAndNot = false, foundAnd = false;
    while ((! foundAndNot || ! foundAnd) && m.find())
        if (m.start(1) != -1)
            foundAndNot = true;
        else
            foundAnd = true;
    return (foundAndNot ? (foundAnd ? 1 : 3)
                        : (foundAnd ? 2 : 0));
}

The left side of the && condition in the while loop is just a short-circuit, to exit the loop early if both are found.

The (?i) in the regex is for making it case-insensitive, which is where regex outshines any contains() implementation.

The m.start(1) != -1 check is to see if the capture group matched, ie to see if the match included the NOT word.

TEST

System.out.println(categorize("A AND B AND NOT C"));     // prints 1
System.out.println(categorize("A AND B AND C"));         // prints 3
System.out.println(categorize("A AND NOT B AND NOT C")); // prints 2
System.out.println(categorize("A OR B OR NOT C"));       // prints 0

Try this:

boolean hasBoth = x.matches("(?=.*AND NOT).*AND(?! NOT).*");
boolean onlyAnd = x.matches("(?!.*AND NOT).*AND.*");
boolean onlyAndNot = x.matches("(?!.*AND(?! NOT)).*AND NOT.*");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM