C ++的“简单”解析器

Question

I have a project (SCC) which is kind of like REPL for C++. 我有一个项目（SCC），有点像C ++的REPL。 At bush prompt I can do 在灌木丛提示下我可以做

scc '2+2'

Or little bit more complex: 或更复杂一点：

scc  'double x = 0.5;  sin(x)'

which is equivalent to: 等效于：

scc  'double x = 0.5;  cout << sin(x) << endl;'

If last (and possible only) statement-expression is not terminated by semicolon it is sent to std::cout . 如果最后一个（且仅可能）statement-expression未用分号终止，则将其发送到std::cout 。 My question is about parsing out last statment from C++ code snippet. 我的问题是关于从C ++代码片段中解析出最后一个语句。 I am well aware how difficult C++ parsing is. 我很清楚C ++解析有多困难。 Parsing out last statement with simple sed script by just looking for last ';' 通过查找last ';'解析带有简单sed脚本的last语句 was initially good enough for me. 最初对我来说足够好。 But now project is bigger than small personal project and I need a better parser. 但是现在项目比小型个人项目还要大，我需要一个更好的解析器。

Below is mini unit-test for my current SED parser. 以下是我当前的SED解析器的微型单元测试。 You can see SED regex which I use to do the parsing: 您可以看到我用来进行解析的SED正则表达式：

    cat  <<EOF  | sed    's/$//;s/[ \t]*$//;s/\(.*[;}]\)*\([^;}]\+$\)/\0    ==>>  \1   PRINT(\2);/'


    print
    no-print;
    OK;  print
    OK;  no-print;
    OK;  no-print;  print
    FAIL;   while(a){b;}  no-print
    FAIL;   while(a)  no-print
    OK;     for(a;b;c) {no-print}
    FAIL;   for(a;b;c) no-print
    OK;     {}
    OK;     {no-print-code-block;}
    FAIL;  print_rvalue_t{1}
    FAIL;   f(int{1})
    FAIL;   f(";")
    FAIL;   f(';')
    FAIL;   f("}")
    EOF

First line after cat -line is empty line. cat -line之后的第一行是空行。 Second line is one space line. 第二行是一个空格行。 3rd - statment not terminated with ';' 第三-陈述不以';'结尾 - should be printed. -应打印。 4th - 2-statment snippet. 4-2句摘要。 And so on. 等等。 If there is FAIL - parser will fail at this line. 如果FAIL ，解析器将在此行失败。 Output looks like this: 输出看起来像这样：

    print   ==>>     PRINT(print);
    no-print;
    OK;  print      ==>>  OK;   PRINT(  print);
    OK;  no-print;
    OK;  no-print;  print   ==>>  OK;  no-print;   PRINT(  print);
    FAIL;     while(a){b;}  print     ==>>  OK;       while(a){b;}   PRINT(  no-print);
    FAIL;   while(a)  no-print      ==>>  FAIL;   PRINT(    while(a)  no-print);
    OK;     for(a;b;c) {no-print}
    FAIL;   for(a;b;c) no-print     ==>>  FAIL;     for(a;b;   PRINT(c) no-print);
    OK;     {}
    OK;     {no-print-code-block;}
    FAIL;  print_rvalue_t{1}
    FAIL;   f(int{1})       ==>>  FAIL;     f(int{1}   PRINT());
    FAIL;   f(";")  ==>>  FAIL;     f(";   PRINT("));
    FAIL;   f(';')  ==>>  FAIL;     f(';   PRINT('));
    FAIL;   f("}")  ==>>  FAIL;     f("}   PRINT("));

Lines without ==>> marker are lines that are passed through parser without modifications. 没有==>>标记的行是不经过修改就通过解析器的行。 After marker is transformed snippet, where last statement is wrapped in PRINT( ) . 在标记转换后的代码段中，最后一条语句包装在PRINT( ) 。 As you can see current SED parser is not very good. 如您所见，当前的SED解析器不是很好。

So am looking for something better. 因此，我正在寻找更好的东西。 I will accept the answer even if it is not 100% correct at parsing. 即使解析不是100％正确，我也会接受答案。 Even better SED script would be good enough for me. 更好的SED脚本对我来说已经足够了。 Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor. 正确的方法可能是使用真实的解析器（例如CLANG之类的东西），但我对此工作的复杂性有些担心。

I've tried to write a parser in boost/xpressive - http://github.com/lvv/scc/blob/master/sccpp.h . 我试图在boost / xpressive- http ://github.com/lvv/scc/blob/master/sccpp.h中编写一个解析器。 Of cause it's not real C++ parser. 当然不是真正的C ++解析器。 It's just a quick hack made just for one thing: parse out last statement. 这只是针对一件事的快速破解：解析出最后一条语句。 It is able to do all above unit tests. 它能够执行以上所有单元测试。 But unfortunately, for longer snippets it was intolerably slow. 但不幸的是，对于较长的摘要，它的速度令人难以忍受。

Question is: how to make a better parser? 问题是：如何做出更好的解析器？

Answer 1

Right way to do it would be probably to use real parser (from something like CLANG) but I am a little bit apprehensive of complexity of this endeavor 正确的方法可能是使用真正的解析器（例如CLANG之类的东西），但是我有点担心这种工作的复杂性

Not too high. 不太高。 The simple fact is that C++ is like HTML- you need a real library to do it, so unless you want to spend years developing your own, pretty much the only way to go is to use an existing C++ parser. 一个简单的事实是C ++就像HTML一样-您需要一个真正的库来做它，因此，除非您想花费数年时间开发自己的库，否则唯一的方法就是使用现有的C ++解析器。 Clang is the only option in this regard. 在这方面，Clang是唯一的选择。 So however complex you find it, you have no other choice. 因此，无论您发现它多么复杂，您别无选择。

C ++的“简单”解析器

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-09-13 10:41:41

C ++的“简单”解析器

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-09-13 10:41:41

解决方案1
1 已采纳 2012-09-13 10:41:41