简体   繁体   中英

Set a rule based on the value of a global variable

In my lexer & parser by ocamllex and ocamlyacc, I have a .mly as follows:

%{
  open Params
  open Syntax
%}

main:
| expr EOF { $1 }

expr:
| INTEGER { EE_integer $1 }
| LBRACKET expr_separators RBRACKET { EE_brackets (List.rev $2) }

expr_separators:
  /* empty */  { [] }
| expr         { [$1] }
| expr_separators ...... expr_separators { $3 :: $1 }

In params.ml , a variable separator is defined. Its value is either ; or , and set by the upstream system.

In the .mly , I want the rule of expr_separators to be defined based on the value of Params.separator . For example, when params.separtor is ; , only [1;2;3] is considered as expr , whereas [1,2,3] is not. When params.separtor is , , only [1,2,3] is considered as expr , whereas [1;2;3] is not.

Does anyone know how to amend the lexer and parser to realize this?

PS:

The value of Params.separator is set before the parsing, it will not change during the parsing.

At the moment, in the lexer, , returns a token COMMA and ; returns SEMICOLON . In the parser, there are other rules where COMMA or SEMICOLON are involved.

I just want to set a rule expr_separators such that it considers ; and ignores , (which may be parsed by other rules), when Params.separator is ; ; and it considers , and ignore ; (which may be parsed by other rules), when Params.separator is , .

In some ways, this request is essentially the same as asking a macro preprocessor to alter its substitution at runtime, or a compiler to alter the type of a variable. As with the program itself, once the grammar has been compiled (whether into executable code or a parsing table), it's not possible to go back and modify it. At least, that's the case for most LR(k) parser generators, which produce deterministic parsers.

Moreover, it seems unlikely that the only difference the configuration parameter makes is the selection of a single separator token. If the non-selected separator token "may be parsed by other rules", then it may be parsed by those other rules when it is the selected separator token, unless the configuration setting also causes those other rules to be suppressed. So at a minimum, it seems like you'd be looking at something like:

expr : general_expr
expr_list : expr
%if separator is comma
expr : expr_using_semicolon
expr_list : expr_list ',' expr
%else
expr : expr_using_comma
expr_list : expr_list ';' expr
%endif

Without a more specific idea of what you're trying to achieve, the best suggestion I can provide is that you write two grammars and select which one to use at runtime, based on the configuration setting. Presumably the two grammars will be mostly similar, so you can probably use your own custom-written preprocessor to generate both of them from the same input text, which might look a bit like the above example. (You can use m4 , which is a general-purpose macro processor, but you might feel the learning curve is too steep for such a simple application.)

Parser generators which produce general parsers have an easier time with run-time dynamic modifications; many such parser generators have mechanisms which can do that (although they are not necessarily efficient mechanisms). For example, the Bison tool can produce GLR parsers, in which case you can select or deselect specific rules using a predicate action. The OCAML GLR generator Dypgen allows sets of rules to be dynamically added to the grammar during the parse. (I've never used dypgen, but I keep on meaning to try it; it looks interesting.) And there are many others.

Having played around with dynamic parsing features in some GLR parsers, I can only say that my personal experience has been a bit mixed. Modifying grammars at run-time is a brittle technique; grammars tend not to be very easy to split into independent pieces, so modifying a grammar rule can have unexpected consequences in places you don't expect to be affected. You don't always know exactly what language your parsing accepts, because the dynamic modifications can be hard to predict. And so on. My suggest, if you try this technique, is to start with the simplest modification possible and put a lot more effort into grammar tests (which is always a good idea, anyway).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM