简体   繁体   中英

Parsing block comments with javacc

I'm trying to write some javacc grammar to parse a file that contains multi-line comments, for example, the following are all valid:

/**/
/* */
/* This is a comment */
/* This
   is
   a
   multiline
   comment
*/

I would like the parsing to fail if there is a /* not closed by a */ , or a closing */ without an opening /* .

I'm not trying to skip the comments, I want the comments available as tokens.

So far I have tried this method, which works but will not fail on un-closed /* :

options {
  STATIC = false;
}

PARSER_BEGIN(BlockComments)

package com.company;

public class BlockComments {}

PARSER_END(BlockComments)

TOKEN : { < START_BLOCK_COMMENT : "/*" >  : WITHIN_BLOCK_COMMENT }
<WITHIN_BLOCK_COMMENT> TOKEN: { < BLOCK_COMMENT: (~["*", "/"] | "*" ~["/"])+ > }
<WITHIN_BLOCK_COMMENT> TOKEN: { < END_BLOCK_COMMENT: "*/" > : DEFAULT }

SKIP : {
  "\n"
}

The other option I have tried is this, which has the same problem and the slight difference that /* and */ are skipped instead being read as tokens:

options {
  STATIC = false;
}

PARSER_BEGIN(BlockComments)

package com.company;

public class BlockComments {}

PARSER_END(BlockComments)

SKIP : { "/*" : WITHIN_BLOCK_COMMENT }
<WITHIN_BLOCK_COMMENT> TOKEN: { <BLOCK_COMMENT: (~["*", "/"] | "*" ~["/"])+ > }
<WITHIN_BLOCK_COMMENT> SKIP : { "*/" : DEFAULT }

SKIP : {
  "\n"
}

I tried using MORE : { "/*" : WITHIN_BLOCK_COMMENT } in the second option which makes sure parsing fails for un-closed /* , but it makes all of the BLOCK_COMMENT tokens start with /* which I don't want.

I'm not sure what the rest of your file looks like, so I'll assume that a file is expected to be a sequence of comments preceded, followed, and separated by zero or more spaces and newlines.

What I would do is this:

TOKEN : { < BLOCK_COMMENT_START : "/*" >  : WITHIN_BLOCK_COMMENT }
<WITHIN_BLOCK_COMMENT> TOKEN: { <CHAR_IN_COMMENT: ~[] > }
<WITHIN_BLOCK_COMMENT> TOKEN: { < END_BLOCK_COMMENT: "*/" > : DEFAULT }

SKIP : {
  "\n" | " " 
}

Now in the parser we have

void start() : {String s ; } {
    (
        s = comment()  {System.out.println(s); }
    )*
}

String comment() :
{   Token t ;
    StringBuffer b = new StringBuffer() ;
}
{  <START_BLOCK_COMMENT>
   (
         t=<CHAR_IN_COMMENT>  {b.append( t.image ) ; }
   )*
   <END_BLOCK_COMMENT>
   {return b.toString() ; }
}

Now you don't get a lexical error for a missing */ , but you do get a parse exception.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM