[英]“simple” parser for c++
I have a project (SCC) which is kind of like REPL for C++. 我有一个项目(SCC),有点像C ++的REPL。 At bush prompt I can do
在灌木丛提示下我可以做
scc '2+2'
Or little bit more complex: 或更复杂一点:
scc 'double x = 0.5; sin(x)'
which is equivalent to: 等效于:
scc 'double x = 0.5; cout << sin(x) << endl;'
If last (and possible only) statement-expression is not terminated by semicolon it is sent to std::cout
. 如果最后一个(且仅可能)statement-expression未用分号终止,则将其发送到
std::cout
。 My question is about parsing out last statment from C++ code snippet. 我的问题是关于从C ++代码片段中解析出最后一个语句。 I am well aware how difficult C++ parsing is.
我很清楚C ++解析有多困难。 Parsing out last statement with simple sed script by just looking for last
';'
通过查找last
';'
解析带有简单sed脚本的last语句 was initially good enough for me. 最初对我来说足够好。 But now project is bigger than small personal project and I need a better parser.
但是现在项目比小型个人项目还要大,我需要一个更好的解析器。
Below is mini unit-test for my current SED parser. 以下是我当前的SED解析器的微型单元测试。 You can see SED regex which I use to do the parsing:
您可以看到我用来进行解析的SED正则表达式:
cat <<EOF | sed 's/$//;s/[ \t]*$//;s/\(.*[;}]\)*\([^;}]\+$\)/\0 ==>> \1 PRINT(\2);/'
print
no-print;
OK; print
OK; no-print;
OK; no-print; print
FAIL; while(a){b;} no-print
FAIL; while(a) no-print
OK; for(a;b;c) {no-print}
FAIL; for(a;b;c) no-print
OK; {}
OK; {no-print-code-block;}
FAIL; print_rvalue_t{1}
FAIL; f(int{1})
FAIL; f(";")
FAIL; f(';')
FAIL; f("}")
EOF
First line after cat
-line is empty line. cat
-line之后的第一行是空行。 Second line is one space line. 第二行是一个空格行。 3rd - statment not terminated with
';'
第三-陈述不以
';'
结尾 - should be printed. -应打印。 4th - 2-statment snippet.
4-2句摘要。 And so on.
等等。 If there is
FAIL
- parser will fail at this line. 如果
FAIL
,解析器将在此行失败。 Output looks like this: 输出看起来像这样:
print ==>> PRINT(print);
no-print;
OK; print ==>> OK; PRINT( print);
OK; no-print;
OK; no-print; print ==>> OK; no-print; PRINT( print);
FAIL; while(a){b;} print ==>> OK; while(a){b;} PRINT( no-print);
FAIL; while(a) no-print ==>> FAIL; PRINT( while(a) no-print);
OK; for(a;b;c) {no-print}
FAIL; for(a;b;c) no-print ==>> FAIL; for(a;b; PRINT(c) no-print);
OK; {}
OK; {no-print-code-block;}
FAIL; print_rvalue_t{1}
FAIL; f(int{1}) ==>> FAIL; f(int{1} PRINT());
FAIL; f(";") ==>> FAIL; f("; PRINT("));
FAIL; f(';') ==>> FAIL; f('; PRINT('));
FAIL; f("}") ==>> FAIL; f("} PRINT("));
Lines without ==>>
marker are lines that are passed through parser without modifications. 没有
==>>
标记的行是不经过修改就通过解析器的行。 After marker is transformed snippet, where last statement is wrapped in PRINT( )
. 在标记转换后的代码段中,最后一条语句包装在
PRINT( )
。 As you can see current SED parser is not very good. 如您所见,当前的SED解析器不是很好。
So am looking for something better. 因此,我正在寻找更好的东西。 I will accept the answer even if it is not 100% correct at parsing.
即使解析不是100%正确,我也会接受答案。 Even better SED script would be good enough for me.
更好的SED脚本对我来说已经足够了。 Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor.
正确的方法可能是使用真实的解析器(例如CLANG之类的东西),但我对此工作的复杂性有些担心。
I've tried to write a parser in boost/xpressive - http://github.com/lvv/scc/blob/master/sccpp.h . 我试图在boost / xpressive- http ://github.com/lvv/scc/blob/master/sccpp.h中编写一个解析器。 Of cause it's not real C++ parser.
当然不是真正的C ++解析器。 It's just a quick hack made just for one thing: parse out last statement.
这只是针对一件事的快速破解:解析出最后一条语句。 It is able to do all above unit tests.
它能够执行以上所有单元测试。 But unfortunately, for longer snippets it was intolerably slow.
但不幸的是,对于较长的摘要,它的速度令人难以忍受。
Question is: how to make a better parser? 问题是:如何做出更好的解析器?
Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor
正确的方法可能是使用真正的解析器(例如CLANG之类的东西),但是我有点担心这种工作的复杂性
Not too high. 不太高 。 The simple fact is that C++ is like HTML- you need a real library to do it, so unless you want to spend years developing your own, pretty much the only way to go is to use an existing C++ parser.
一个简单的事实是C ++就像HTML一样-您需要一个真正的库来做它,因此,除非您想花费数年时间开发自己的库,否则唯一的方法就是使用现有的C ++解析器。 Clang is the only option in this regard.
在这方面,Clang是唯一的选择。 So however complex you find it, you have no other choice.
因此,无论您发现它多么复杂,您别无选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.