简体   繁体   中英

Context-free-grammar to represent regular expressions

I'm trying to make a context-free-grammar to represent simple regular expressions. The symbols that I want is [0-9][az][AZ], and operators is "|", "()" and "." for concatenation, and for sequences for now I only want "*" later I will add "+","?", etc. I tried this grammar in javacc:

void RE(): {}
{
    FINAL(0) ( "." FINAL(0) | "|" FINAL(0))*
}

void FINAL(int sign): { Token t; }
{
    t = <SYMBOL> {
        if ( sign == 1 )
            jjtThis.val = t.image + "*";
        else
            jjtThis.val = t.image;
    }
    | FINAL(1) "*"
    | "(" RE() ")"
}

The problem is in FINAL function the line | FINAL(1) "*" | FINAL(1) "*" that gives me a error Left recursion detected: "FINAL... --> FINAL... . Putting "*" on the left of FINAL(1) resolve the problem but this is not what I want..

I already tried to read the article from wikipedia to remove left recursion but I really don't know how to do it, can someone help? :s

The following takes care of the left recursion

RE --> FACTOR ("." FINAL | "|" FINAL)*
FINAL --> PRIMARY ( "*" )*
PRIMARY --> <SYMBOL> | "(" RE ")"

However, that won't give . precedence over | . For that you can do the following

RE --> TERM ("|" TERM)*
TERM --> FINAL ("." FINAL)*
FINAL --> PRIMARY ( "*" )*
PRIMARY --> <SYMBOL> | "(" RE ")"

The general rule is

A --> A b | c | d | ...

can be transformed to

A --> B b*
B --> c | d | ...

where B is a new nonnterminal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM