简体   繁体   中英

Lexing strings in ocamllex

I have been having some trouble trying to find a good example to go off of for being able to handle strings in ocamllex. I found the desktop calculator example to be somewhat useful but haven't really found a way to implement it in a similar fashion in which it uses strings as well, here is the example I'm referencing:

        {
        open Parser        (* The type token is defined in parser.mli *)
        exception Eof
        }
        rule token = parse
            [' ' '\t']     { token lexbuf }     (* skip blanks *)
          | ['\n' ]        { EOL }
          | ['0'-'9']+ as lxm { INT(int_of_string lxm) }
          | '+'            { PLUS }
          | '-'            { MINUS }
          | '*'            { TIMES }
          | '/'            { DIV }
          | '('            { LPAREN }
          | ')'            { RPAREN }
          | eof            { raise Eof }

Any help would be greatly appreciated.

I assume you're talking about double-quoted strings as in OCaml. The difficulty in lexing strings is that they require some escape mechanism to allow representing quotes (and the escape mechanism itself, usually).

Here is a simplified version of the code for strings from the OCaml lexer itself:

let string_buff = Buffer.create 256

let char_for_backslash = function
  | 'n' -> '\010'
  | 'r' -> '\013'
  | 'b' -> '\008'
  | 't' -> '\009'
  | c   -> c

. . .

let backslash_escapes =
    ['\\' '\'' '"' 'n' 't' 'b' 'r' ' ']

. . .

rule main = parse
. . .
| '"'
    { Buffer.clear string_buff;
      string lexbuf;
      STRING (Buffer.contents string_buff) }
. . .

and string = parse
| '"'
    { () }
| '\\' (backslash_escapes as c)
    { Buffer.add_char string_buff (char_for_backslash c);
      string lexbuf }
| _ as c
    { Buffer.add_char string_buff c;
      string lexbuf }

Edit : The key feature of this code is that it uses a second scanner (named string ) to do the lexical analysis within a quoted string. This generally keeps things cleaner than trying to write a single scanner for all tokens--some tokens are quite complicated. A similar technique is often used for scanning comments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM