简体   繁体   中英

regular expression on method Pattern.compile()

i have to create a regular expression to put into the method Pattern.compile(regex); My regex has to allow integer (without 0 at the beginning of the number),a sequence of numbers and char (A-Za-z),but the problem is the third point: a string that must begin and end with the char ' " ', must avoid the chars backslash (unless there isn't one other backslash) and the char '"' (unless there is before a backslash)

but i don't understand how to do the second point ( i have met a lot of errors), this is my java code:

public static void main(String[] args) {
    if (args.length == 0)
        throw new IllegalArgumentException();
    Matcher matcher = Pattern.compile("([a-zA-Z]+[0-9a-zA-Z_]*)|"
            + "(0(?![0-9])|([1-9]+)([0-9]*))|" //"?!" è una asserzione : " se la condizione tra parentesi è vera non considerare lo zero
            + "([\"]{1}(([\\\\][^\"\\][\\\"])*)[\"]{1})" 
            + "|(\\s+)").matcher(args[0]);// \s = [ \t\n\x0B\f\r]
    System.out.println("Input: " + args[0]); //println va a capo dopo la stampa
    while (matcher.lookingAt()) {
        System.out.print("Lexeme '" + matcher.group() + "'"); //non va a capo dopo la stampa
        System.out.println(" group " + ExampleLexer.getGroup(matcher));
        matcher.region(matcher.end(), matcher.regionEnd());
    }

     //attenzione: matcher.hitEnd() restituisce true se il matcher arriva in fondo
     //all'input anche se l'ultimo match non ha avuto successo, quindi funziona solo
     //per espressioni regolari "semplici"
    if (matcher.regionStart() == matcher.regionEnd())
        System.out.println("All lexems succesfully matched");
    else {
        System.err.print("Unmatched lexem ");
        matcher.usePattern(Pattern.compile(".*"));
        matcher.lookingAt();
        System.err.println(matcher.group());
    }
}
"\"(\\\\.|[^\"\\\\])*\""            // (1)
"\"(\\\\[\"\\\\]|[^\"\\\\])*\""     // (2)
  • quote
  • zero or more times ( ... )*
    • either backslash \\ followed by (1) any char . / (2) any non-quote/non-backslash
    • or | not a quote, not a backslash [^ ... ]
  • quote

On the escapes:

  1. \\" is a String escape representing one char, the double quote.
  2. \\\\ is the String escape representing a backslash, one char.
  3. \\\\\\\\ is the regex escape \\\\ to represent the backslash itself instead of being a regex escape sequence itself.
  4. If you ever want to replace backslashes, first attempt it without regex: s = s.replace("\\\\", "\\\\\\\\"); doubles every backslash. Guess how that is written in regex with replaceAll .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM