简体   繁体   中英

JavaCC quote with escape character

What is the usual way of tokenizing quoted strings that can contain an escape character? Here are some examples:

1) "this is good"
2) "this is\"good\""
3) "this \is good"
4) "this is bad\"
5) "this is \\"bad"
6) "this is bad
7)  this is bad"
8)  this is bad

Below is a sample parser that doesn't work quite right; it has expected results for all except examples 4 and 5, which parse successfully.

options
{
  LOOKAHEAD = 3;
  CHOICE_AMBIGUITY_CHECK = 2;
  OTHER_AMBIGUITY_CHECK = 1;
  STATIC = false;
  DEBUG_PARSER = false;
  DEBUG_LOOKAHEAD = false;
  DEBUG_TOKEN_MANAGER = true;
  ERROR_REPORTING = true;
  JAVA_UNICODE_ESCAPE = false;
  UNICODE_INPUT = false;
  IGNORE_CASE = false;
  USER_TOKEN_MANAGER = false;
  USER_CHAR_STREAM = false;
  BUILD_PARSER = true;
  BUILD_TOKEN_MANAGER = true;
  SANITY_CHECK = true;
  FORCE_LA_CHECK = true;
}

PARSER_BEGIN(MyParser)
import java.io.ByteArrayInputStream;
import java.io.UnsupportedEncodingException;
public class MyParser {
    public static void main(String[] args) throws UnsupportedEncodingException, ParseException{
        //note that this conversion to an input stream is only good for small strings
        MyParser parser = new MyParser(new ByteArrayInputStream(args[0].getBytes("UTF-8")));
        parser.enable_tracing();
        parser.myProduction();
        System.out.println("Must have worked!");
    }
}
PARSER_END(MyParser)

TOKEN:
{
<QUOTED: 
    "\"" 
    (
        "\\" ~[]    //any escaped character
        |           //or
        ~["\""]      //any non-quote character
    )* 
    "\""
>
}


void myProduction() :
{}
{
    <QUOTED>
    <EOF>
}

You can run MyParser from the command line with an input to parse. It will print "must have worked!" if it worked, or throw an error if it didn't.

How do I change this parser to correctly fail on examples 4 and 5?

To fix your regular expression, make it

TOKEN: {
<QUOTED: 
    "\"" 
    (
         "\\" ~[]     //any escaped character
    |                 //or
        ~["\"","\\"]  //any character except quote or backslash
    )* 
    "\"" > 
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM