简体   繁体   中英

“simple” parser for c++

I have a project (SCC) which is kind of like REPL for C++. At bush prompt I can do

scc '2+2'

Or little bit more complex:

scc  'double x = 0.5;  sin(x)'

which is equivalent to:

scc  'double x = 0.5;  cout << sin(x) << endl;'

If last (and possible only) statement-expression is not terminated by semicolon it is sent to std::cout . My question is about parsing out last statment from C++ code snippet. I am well aware how difficult C++ parsing is. Parsing out last statement with simple sed script by just looking for last ';' was initially good enough for me. But now project is bigger than small personal project and I need a better parser.

Below is mini unit-test for my current SED parser. You can see SED regex which I use to do the parsing:

    cat  <<EOF  | sed    's/$//;s/[ \t]*$//;s/\(.*[;}]\)*\([^;}]\+$\)/\0    ==>>  \1   PRINT(\2);/'


    print
    no-print;
    OK;  print
    OK;  no-print;
    OK;  no-print;  print
    FAIL;   while(a){b;}  no-print
    FAIL;   while(a)  no-print
    OK;     for(a;b;c) {no-print}
    FAIL;   for(a;b;c) no-print
    OK;     {}
    OK;     {no-print-code-block;}
    FAIL;  print_rvalue_t{1}
    FAIL;   f(int{1})
    FAIL;   f(";")
    FAIL;   f(';')
    FAIL;   f("}")
    EOF

First line after cat -line is empty line. Second line is one space line. 3rd - statment not terminated with ';' - should be printed. 4th - 2-statment snippet. And so on. If there is FAIL - parser will fail at this line. Output looks like this:

    print   ==>>     PRINT(print);
    no-print;
    OK;  print      ==>>  OK;   PRINT(  print);
    OK;  no-print;
    OK;  no-print;  print   ==>>  OK;  no-print;   PRINT(  print);
    FAIL;     while(a){b;}  print     ==>>  OK;       while(a){b;}   PRINT(  no-print);
    FAIL;   while(a)  no-print      ==>>  FAIL;   PRINT(    while(a)  no-print);
    OK;     for(a;b;c) {no-print}
    FAIL;   for(a;b;c) no-print     ==>>  FAIL;     for(a;b;   PRINT(c) no-print);
    OK;     {}
    OK;     {no-print-code-block;}
    FAIL;  print_rvalue_t{1}
    FAIL;   f(int{1})       ==>>  FAIL;     f(int{1}   PRINT());
    FAIL;   f(";")  ==>>  FAIL;     f(";   PRINT("));
    FAIL;   f(';')  ==>>  FAIL;     f(';   PRINT('));
    FAIL;   f("}")  ==>>  FAIL;     f("}   PRINT("));

Lines without ==>> marker are lines that are passed through parser without modifications. After marker is transformed snippet, where last statement is wrapped in PRINT( ) . As you can see current SED parser is not very good.

So am looking for something better. I will accept the answer even if it is not 100% correct at parsing. Even better SED script would be good enough for me. Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor.

I've tried to write a parser in boost/xpressive - http://github.com/lvv/scc/blob/master/sccpp.h . Of cause it's not real C++ parser. It's just a quick hack made just for one thing: parse out last statement. It is able to do all above unit tests. But unfortunately, for longer snippets it was intolerably slow.

Question is: how to make a better parser?

Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor

Not too high. The simple fact is that C++ is like HTML- you need a real library to do it, so unless you want to spend years developing your own, pretty much the only way to go is to use an existing C++ parser. Clang is the only option in this regard. So however complex you find it, you have no other choice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM