简体繁体 English

boost.org的Spirit解析器 - 生成器框架有哪些缺点？

[英]What are the disadvantages of the Spirit parser-generator framework from boost.org?

原文 2009-01-11 02:01:58 9 5 c++/ parsing/ boost/ boost-spirit/ parser-generator

In several questions I've seen recommendations for the Spirit parser-generator framework from boost.org , but then in the comments there is grumbling from people using Spirit who are not happy. 在几个问题中，我看到了来自boost.org的Spirit解析器 - 生成器框架的建议，但是在评论中，人们抱怨使用不开心的Spirit。 Will those people please stand forth and explain to the rest of us what are the drawbacks or downsides to using Spirit? 请那些人站出来向我们其他人解释使用Spirit的缺点或缺点是什么？

5 个解决方案

It is a quite cool idea, and I liked it; 这是一个非常酷的主意，我喜欢它; it was especially useful to really learn how to use C++ templates. 真正学习如何使用C ++模板特别有用。

But their documentation recommends the usage of spirit for small to medium-size parsers. 但他们的文档建议使用精神用于中小型解析器。 A parser for a full language would take ages to compile. 完整语言的解析器需要很长时间才能编译。 I will list three reasons. 我将列出三个原因。

Scannerless parsing. 无扫描解析。 While it's quite simpler, when backtracking is required it may slow down the parser. 虽然它非常简单，但是当需要回溯时，它可能会降低解析器的速度。 It's optional though - a lexer might be integrated, see the C preprocessor built with Spirit. 它是可选的 - 可以集成词法分析器，参见使用Spirit构建的C预处理器。 A grammar of ~300 lines (including both .h and .cpp files) compiles (unoptimized) to a file of 6M with GCC. 大约300行（包括.h和.cpp文件）的语法使用GCC编译（未优化）到6M的文件。 Inlining and maximum optimizations gets that down to ~1,7M. 内联和最大优化可以达到约1,700万。
Slow parsing - there is no static checking of the grammar, neither to hint about excessive lookahead required, nor to verify basic errors, such as for instance usage of left recursion (which leads to infinite recursion in recursive-descent parsers LL grammars). 缓慢解析 - 没有静态检查语法，既不暗示需要过多的前瞻，也不验证基本错误，例如左递归的使用（这导致递归下降解析器LL语法中的无限递归）。 Left recursion is not a really hard bug to track down, though, but excessive lookahead might cause exponential parsing times. 但是，左递归并不是一个非常难以追踪的错误，但过多的前瞻可能会导致指数解析时间。
Heavy template usage - while this has certain advantages, this impacts compilation times and code size. 大量使用模板 - 虽然这具有一定的优势，但这会影响编译时间和代码大小。 Additionally, the grammar definition must normally be visible to all other users, impacting even more compilation times. 此外，语法定义通常必须对所有其他用户可见，从而影响更多的编译时间。 I've been able to move grammars to .cpp files by adding explicit template instantiations with the right parameters, but it was not easy. 我已经能够通过使用正确的参数添加显式模板实例来将语法移动到.cpp文件，但这并不容易。

UPDATE: my response is limited to my experience with Spirit classic, not Spirit V2. 更新：我的回答仅限于我对Spirit经典的体验，而不是Spirit V2。 I would still expect Spirit to be heavily template-based, but now I'm just guessing. 我仍然希望Spirit基于模板，但现在我只是在猜测。

In boost 1.41 a new version of Spirit is being released, and it beats of pants off of spirit::classic: 在1.41版本中，一个新版本的Spirit正在发布，它的精神与经典之作：经典：

After a long time in beta (more than 2 years with Spirit 2.0), Spirit 2.1 will finally be released with the upcoming Boost 1.41 release. 经过长时间的测试（使用Spirit 2.0超过2年），Spirit 2.1将最终在即将推出的Boost 1.41版本中发布。 The code is very stable now and is ready for production code. 代码现在非常稳定，可以用于生产代码。 We are working hard on finishing the documentation in time for Boost 1.41. 我们正在努力为Boost 1.41及时完成文档。 You can peek at the current state of the documentation here. 您可以在此查看文档的当前状态。 Currently, you can find the code and documentation in the Boost SVN trunk. 目前，您可以在Boost SVN中继中找到代码和文档。 If you have a new project involving Spirit, we highly recommend starting with Spirit 2.1 now. 如果您有一个涉及Spirit的新项目，我们强烈建议您现在从Spirit 2.1开始。 Allow me to quote OvermindDL's post from the Spirit mailing list: 请允许我在Spirit邮件列表中引用OvermindDL的帖子：

I may start to sound like a bot with how often I say this, but Spirit.Classic is ancient, you should switch to Spirit2.1, it can do everything you did above a GREAT deal easier, a lot less code, and it executes faster. 我可能听起来像机器人，我经常说这个，但是Spirit.Classic是古老的，你应该切换到Spirit2.1，它可以做你在上面做的所有事情，更容易，更少的代码，并执行快点。 For example, Spirit2.1 can build your entire AST inline, no weird overriding, no need to build things up afterwards, etc..., all as one nice and fast step. 例如，Spirit2.1可以构建你的整个AST内联，没有奇怪的覆盖，不需要事后构建等等......所有这些都是一个不错的快速步骤。 You really need to update. 你真的需要更新。 See the other posts from the past day for links to docs and such for Spirit2.1. 查看过去一天的其他帖子，获取Spirit2.1等文档的链接。 Spirit2.1 is currently in Boost Trunk, but will be formally released with Boost 1.41, but is otherwise complete. Spirit2.1目前在Boost Trunk中，但将在Boost 1.41正式发布，但在其他方面完成。

For me, the biggest problem is that expressions in Spirit, as seen by compiler or debugger, are rather long (I copied below a part of one expression in Spirit Classic). 对我来说，最大的问题是，编译器或调试器看到的Spirit中的表达式相当长（我复制到Spirit Classic中一个表达式的一部分下面）。 These expressions scare me. 这些表情让我害怕。 When I work on a program that uses Spirit, I'm afraid to use valgrind or to print backtrace in gdb. 当我处理使用Spirit的程序时，我害怕使用valgrind或在gdb中打印回溯。

boost::spirit::classic::parser_result<boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<optional_suffix_parser<char const*>, boost::spirit::classic::ref_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > >, boost::spirit::classic::kleene_star<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > > > > > >, void ( )(char const , char const*)>, boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::type boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<

Here is what I don't like about it: 这是我不喜欢的：

the documentation is limited. 文件有限。 There is one big web page where "everything" is explained, but the current explanations lack in details. 有一个很大的网页，其中解释了“一切”，但目前的解释缺乏细节。
poor AST generation. AST生成不佳。 ASTs are poorly explained and, even after hitting your head against the wall to understand how the AST modifiers work, it's difficult to obtain an easy to manipulate AST (ie one that maps well to the problem domain) AST很难解释，甚至在撞到墙上以了解AST修饰符如何工作之后，很难获得易于操作的AST（即很好地映射到问题域的AST）
It increases compilation times enormously, even for "medium"-sized grammars 它极大地增加了编译时间，即使对于“中等”大小的语法也是如此
Syntax is too heavyweight. 语法太重了。 It is a fact of life that in C/C++ you must duplicate code (ie between declaration and definition). 生活中的事实是，在C / C ++中，您必须复制代码（即在声明和定义之间）。 However, it seems that in boost::spirit, when you declare a grammar<>, you must repeat some things 3 times :D (when you want ASTs, which is what I want :D) 但是，似乎在boost :: spirit中，当你声明一个语法<>时，你必须重复一些东西3次：D（当你想要AST时，这就是我想要的：D）

Other than this, I think they did a pretty good job with the parser, given the limitations of C++. 除此之外，考虑到C ++的局限性，我认为它们在解析器方面表现相当不错。 But I think they should improve it more. 但我认为他们应该更多地改进它。 The history page describes that there was a "dynamic" spirit before the current "static" spirit; 历史页面描述了当前“静态”精神面前的“动态”精神; I'm wondering how much faster and how much better syntax it had. 我想知道它有多快，语法有多好。

I would say the biggest problem is the lack of any diagnosis or other help for grammar problems. 我想说最大的问题是语法问题缺乏任何诊断或其他帮助。 If your grammar is ambiguous, the parser might not parse what you expect it to, and there's no good way of noticing that. 如果你的语法含糊不清，解析器可能无法解析你的预期，并且没有好的方法可以注意到这一点。