I have a project (SCC) which is kind of like REPL for C++. At bush prompt I can do
scc '2+2'
Or little bit more complex:
scc 'double x = 0.5; sin(x)'
which is equivalent to:
scc 'double x = 0.5; cout << sin(x) << endl;'
If last (and possible only) statement-expression is not terminated by semicolon it is sent to std::cout
. My question is about parsing out last statment from C++ code snippet. I am well aware how difficult C++ parsing is. Parsing out last statement with simple sed script by just looking for last ';'
was initially good enough for me. But now project is bigger than small personal project and I need a better parser.
Below is mini unit-test for my current SED parser. You can see SED regex which I use to do the parsing:
cat <<EOF | sed 's/$//;s/[ \t]*$//;s/\(.*[;}]\)*\([^;}]\+$\)/\0 ==>> \1 PRINT(\2);/'
print
no-print;
OK; print
OK; no-print;
OK; no-print; print
FAIL; while(a){b;} no-print
FAIL; while(a) no-print
OK; for(a;b;c) {no-print}
FAIL; for(a;b;c) no-print
OK; {}
OK; {no-print-code-block;}
FAIL; print_rvalue_t{1}
FAIL; f(int{1})
FAIL; f(";")
FAIL; f(';')
FAIL; f("}")
EOF
First line after cat
-line is empty line. Second line is one space line. 3rd - statment not terminated with ';'
- should be printed. 4th - 2-statment snippet. And so on. If there is FAIL
- parser will fail at this line. Output looks like this:
print ==>> PRINT(print);
no-print;
OK; print ==>> OK; PRINT( print);
OK; no-print;
OK; no-print; print ==>> OK; no-print; PRINT( print);
FAIL; while(a){b;} print ==>> OK; while(a){b;} PRINT( no-print);
FAIL; while(a) no-print ==>> FAIL; PRINT( while(a) no-print);
OK; for(a;b;c) {no-print}
FAIL; for(a;b;c) no-print ==>> FAIL; for(a;b; PRINT(c) no-print);
OK; {}
OK; {no-print-code-block;}
FAIL; print_rvalue_t{1}
FAIL; f(int{1}) ==>> FAIL; f(int{1} PRINT());
FAIL; f(";") ==>> FAIL; f("; PRINT("));
FAIL; f(';') ==>> FAIL; f('; PRINT('));
FAIL; f("}") ==>> FAIL; f("} PRINT("));
Lines without ==>>
marker are lines that are passed through parser without modifications. After marker is transformed snippet, where last statement is wrapped in PRINT( )
. As you can see current SED parser is not very good.
So am looking for something better. I will accept the answer even if it is not 100% correct at parsing. Even better SED script would be good enough for me. Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor.
I've tried to write a parser in boost/xpressive - http://github.com/lvv/scc/blob/master/sccpp.h . Of cause it's not real C++ parser. It's just a quick hack made just for one thing: parse out last statement. It is able to do all above unit tests. But unfortunately, for longer snippets it was intolerably slow.
Question is: how to make a better parser?
Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor
Not too high. The simple fact is that C++ is like HTML- you need a real library to do it, so unless you want to spend years developing your own, pretty much the only way to go is to use an existing C++ parser. Clang is the only option in this regard. So however complex you find it, you have no other choice.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.