如何使用`boost :: spirit`将语法解析为`std :: set`？

Question

TL;DR TL; DR

How to parse the result of a boost::spirit grammar into an std::set ? 如何将boost::spirit语法的结果解析为std::set ？

Full problem statement 完整的问题陈述

As an exercise to learn how to use boost::spirit , I am designing a parser for X.500/LDAP Distinguished Names. 作为学习如何使用boost::spirit的练习，我正在为X.500 / LDAP专有名称设计解析器。 The grammar can be found in a BNF format in the RFC-1779 . 语法可以在RFC-1779中以BNF格式找到。

I "unrolled" it and translated it into boost::spirit rules. 我“展开”它并将其翻译成boost::spirit规则。 That's the first step. 这是第一步。 Basically, a DN is a set of RDN (Relative Distinguished Names) which themselves are tuples of (Key,Value) pairs. 基本上，DN是一组RDN（相对专有名称），它们本身是（Key，Value）对的元组。

I think about using 我考虑使用

typedef std::unordered_map<std::string, std::string> rdn_type;

to represent each RDN. 代表每个RDN。 The RDNs are then gathered into a std::set<rdn_type> 然后将RDN收集到std::set<rdn_type>

My issue is that going through the (good) documentation of boost::spirit , I didn't find out how to populate the set. 我的问题是通过boost::spirit的（好）文档，我没有找到如何填充集合。

My current code can be found on github and I'm trying to refine it whenever I can. 我当前的代码可以在github上找到，我试图尽可能地优化它。

Starting a satanic dance to summon SO's most popular polar bear :p 开始撒旦舞蹈召唤SO最受欢迎的北极熊：p

Current code 目前的代码

In order to have an all-at-one-place question, I add a copy of the code here, it's a bit long so I put it at the end :) 为了得到一个一个一个地方的问题，我在这里添加了一个代码的副本，它有点长，所以我把它放在最后:)

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;

typedef std::unordered_map<std::string, std::string> dn_key_value_map;

template <typename Iterator>
struct dn_grammar_common : public qi::grammar<Iterator, std::multiset<dn_key_value_map>(), ascii::space_type> {
  struct dn_reserved_chars_ : public qi::symbols<char, char> {
    dn_reserved_chars_() {
      add
        ("\\", "\\")
        ("=" , "=")
        ("+" , "+")
        ("," , ",")
        (";" , ";")
        ("#" , "#")
        ("<" , "<")
        (">" , ">")
        ("\"", "\"")
        ("%" , "%");
    }
  } dn_reserved_chars;
  dn_grammar_common() : dn_grammar_common::base_type(dn) {
    // Useful using directives
    using namespace qi::labels;

    // Low level rules
    // Key can only contain alphanumerical characters and dashes
    key = ascii::no_case[qi::lexeme[(*qi::alnum) >> (*(qi::char_('-') >> qi::alnum))]];
    escaped_hex_char = qi::lexeme[(&qi::char_("\\")) >> qi::repeat(2)[qi::char_("0-9a-fA-F")]];
    escaped_sequence = escaped_hex_char |
                      qi::lexeme[(&qi::char_("\\")) >> dn_reserved_chars];
    // Rule for a fully escaped string (used as Attribute Value) => "..."
    quote_string = qi::lexeme[qi::lit('"') >>
      *(escaped_sequence | (qi::char_ - qi::char_("\\\""))) >>
      qi::lit('"')
    ];
    // Rule for an hexa string (used as Attribute Value) => #23AD5D...
    hex_string = (&qi::char_("#")) >> *qi::lexeme[(qi::repeat(2)[qi::char_("0-9a-fA-F")])];

    // Value is either:
    // - A regular string (that can contain escaped sequences)
    // - A fully escaped string (that can also contain escaped sequences)
    // - An hexadecimal string
    value = (qi::lexeme[*((qi::char_ - dn_reserved_chars) | escaped_sequence)]) |
            quote_string |
            hex_string;

    // Higher level rules
    rdn_pair = key >> '=' >> value;
    // A relative distinguished name consists of a sequence of pairs (Attribute = AttributeValue)
    // Separated with a +
    rdn = rdn_pair % qi::char_("+");
    // The DN is a set of RDNs separated by either a "," or a ";".
    // The two separators can coexist in a given DN, though it is not
    // recommended practice.
    dn = rdn % (qi::char_(",;"));
  }
  qi::rule<Iterator, std::set<dn_key_value_map>(), ascii::space_type> dn;
  qi::rule<Iterator, dn_key_value_map(), ascii::space_type> rdn;
  qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> rdn_pair;
  qi::rule<Iterator, std::string(), ascii::space_type> key, value, hex_string, quote_string;
  qi::rule<Iterator, std::string(), ascii::space_type> escaped_hex_char, escaped_sequence;
};

Answer 1

I suspect you just need fusion/adapted/std_pair.hpp 我怀疑你只需要fusion/adapted/std_pair.hpp

Let me try to make it compile 让我试着让它编译

Ok 好

your start rule was incompatible 你的开始规则是不兼容的

  qi::rule<Iterator, std::multiset<dn_key_value_map>(), ascii::space_type> dn;

the symbol table should map to string, not char 符号表应映射到字符串，而不是char
```
 struct dn_reserved_chars_ : public qi::symbols<char, std::string> { 
```
or you should change the mapped values to char literals. 或者您应该将映射值更改为char文字。

Why do you use this, instead of char_("\\\\=+,;#<>\\"%") ? 为什么使用它而不是char_("\\\\=+,;#<>\\"%") ？

Update 更新

Have completed my review of the Grammar (purely from the implementation point-of-view, so I haven't actually read the RFC to check the assumptions). 完成了我对语法的评论（纯粹从实现的角度来看，所以我实际上还没有阅读RFC来检查假设）。

I created a pull request here: https://github.com/Rerito/pkistore/pull/1 我在这里创建了一个拉取请求： https ： //github.com/Rerito/pkistore/pull/1

General Notes 一般注意事项
- unordered maps aren't sortable, so used map<string,string> 无序映射不可排序，因此使用map<string,string>
- the outer set is technically not a set (?) in the RFC, made it a vector (also makes the output between relative domain names correspond more to input order) 外部集合在技术上不是RFC中的集合（？），使其成为一个向量（也使得相对域名之间的输出更多地对应于输入顺序）
- removed superstitious includes (Fusion set/map are completely unrelated to std::set/map. Just need std_pair.hpp for maps to work) 删除迷信包含（融合集/地图与std :: set / map完全无关。只需要std_pair.hpp即可使地图工作）
Grammar rules: 语法规则：
- symbols<char,char> requires char values (not "." but '.' ) symbols<char,char>需要char值（不是"."而是'.' ）
- Many simplifications 许多简化
  - remove &char_(...) instances (they don't match anything, it's just an assertion) remove &char_(...)实例（它们与任何东西都不匹配，它只是一个断言）
  - remove impotent no_case[] 删除无能的no_case[]
  - removed unnecessary lexeme[] directives; 删除了不必要的lexeme[]指令; most have been realized by removing the skipper from the rule declarations 大多数都是通过从规则声明中删除队长来实现的
  - removed some rule declarations at all (the rule def aren't complex enough to warrant the overhead incurred), eg hex_string 完全删除了一些规则声明（规则def不够复杂，无法保证产生的开销），例如hex_string
  - made key require at least one character (not checked the specs). make key需要至少一个字符（未检查规格）。 Note how 请注意如何
```
 key = ascii::no_case[qi::lexeme[(*qi::alnum) >> (*(qi::char_('-') >> qi::alnum))]]; 
```
    became 成为
```
 key = raw[ alnum >> *(alnum | '-') ]; 
```
    raw means that the input sequence will be reflected verbatim (instead of building a copy character by character) raw表示输入序列将逐字反映（而不是按字符构建复制字符）
  - reordered branches on value (not checked, but I wager unqouted strings would basically eat everything else) 重新排序分支value （未检查，但我下注未分析的字符串基本上会吃掉其他所有内容）
  - made hexchar expose the actual data using qi::int_parser<char, 16, 2, 2> 使用qi::int_parser<char, 16, 2, 2>使hexchar暴露实际数据
Tests 测试
Added a test program test.cpp, based on the Examples section in the rfc (3.). 添加了测试程序test.cpp，基于rfc（3。）中的Examples部分。
Added some more complicated examples of my own devising. 添加了一些我自己设计的更复杂的例子。
Loose Ends 松散的结束
To do: review the specs for actual rules and requirements on 要做的事情：查看有关实际规则和要求的规范
- escaping special characters 逃避特殊人物
- inclusion of whitespace (incl. newline characters) inside the various string flavours: 在各种字符串风格中包含空格（包括换行符）：
  - hex #xxxx strings might allow for newlines (makes sense to me) 十六进制#xxxx字符串可能允许换行符（对我来说很有意义）
  - unquoted strings might not (idem) 不带引号的字符串可能不是（同上）
Also enabled optional BOOST_SPIRIT_DEBUG 还启用了可选的BOOST_SPIRIT_DEBUG
Also made the skipper internal to the grammar (security!) 还使船长内部的语法（安全！）
Also made a convenience free function that makes the parser usable without leaking implementation details (Qi) 还提供了一个方便免费的功能，使解析器可用而不会泄漏实现细节（Qi）

Live Demo 现场演示

Live On Coliru 住在Coliru

//#include "dn_parser.hpp"
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <set>

namespace pkistore {
    namespace parsing {

    namespace qi      = boost::spirit::qi;
    namespace ascii   = boost::spirit::ascii;

    namespace ast {
        typedef std::map<std::string, std::string> rdn;
        typedef std::vector<rdn> dn;
    }

    template <typename Iterator>
    struct dn_grammar_common : public qi::grammar<Iterator, ast::dn()> {
        dn_grammar_common() : dn_grammar_common::base_type(start) {
            using namespace qi;

            // syntax as defined in rfc1779
            key          = raw[ alnum >> *(alnum | '-') ];

            char_escape  = '\\' >> (hexchar | dn_reserved_chars);
            quote_string = '"' >> *(char_escape | (char_ - dn_reserved_chars)) >> '"' ;

            value        =  quote_string 
                         | '#' >> *hexchar
                         | *(char_escape | (char_ - dn_reserved_chars))
                         ;

            rdn_pair     = key >> '=' >> value;

            rdn          = rdn_pair % qi::char_("+");
            dn           = rdn % qi::char_(",;");

            start        = skip(qi::ascii::space) [ dn ];

            BOOST_SPIRIT_DEBUG_NODES((start)(dn)(rdn)(rdn_pair)(key)(value)(quote_string)(char_escape))
        }

    private:
        qi::int_parser<char, 16, 2, 2> hexchar;

        qi::rule<Iterator, ast::dn()> start;

        qi::rule<Iterator, ast::dn(), ascii::space_type> dn;
        qi::rule<Iterator, ast::rdn(), ascii::space_type> rdn;
        qi::rule<Iterator, std::pair<std::string, std::string>(), ascii::space_type> rdn_pair;

        qi::rule<Iterator, std::string()> key, value, quote_string;
        qi::rule<Iterator, char()>        char_escape;

        struct dn_reserved_chars_ : public qi::symbols<char, char> {
            dn_reserved_chars_() {
                add ("\\", '\\') ("\"", '"')
                    ("=" , '=')  ("+" , '+')
                    ("," , ',')  (";" , ';')
                    ("#" , '#')  ("%" , '%')
                    ("<" , '<')  (">" , '>')
                    ;
            }
        } dn_reserved_chars;
    };

    } // namespace parsing

    static parsing::ast::dn parse(std::string const& input) {
        using It = std::string::const_iterator;

        pkistore::parsing::dn_grammar_common<It> const g;

        It f = input.begin(), l = input.end();
        pkistore::parsing::ast::dn parsed;

        bool ok = boost::spirit::qi::parse(f, l, g, parsed);

        if (!ok || (f!=l))
            throw std::runtime_error("dn_parse failure");

        return parsed;
    }
} // namespace pkistore

int main() {
    for (std::string const input : {
            "OU=Sales + CN=J. Smith, O=Widget Inc., C=US",
            "OU=#53616c6573",
            "OU=Sa\\+les + CN=J. Smi\\%th, O=Wid\\,\\;get In\\3bc., C=US",
            //"CN=Marshall T. Rose, O=Dover Beach Consulting, L=Santa Clara,\nST=California, C=US",
            //"CN=FTAM Service, CN=Bells, OU=Computer Science,\nO=University College London, C=GB",
            //"CN=Markus Kuhn, O=University of Erlangen, C=DE",
            //"CN=Steve Kille,\nO=ISODE Consortium,\nC=GB",
            //"CN=Steve Kille ,\n\nO =   ISODE Consortium,\nC=GB",
            //"CN=Steve Kille, O=ISODE Consortium, C=GB\n",
        })
    {
        auto parsed = pkistore::parse(input);

        std::cout << "===========\n" << input << "\n";
        for(auto const& dn : parsed) {
            std::cout << "-----------\n";
            for (auto const& kv : dn) {
                std::cout << "\t" << kv.first << "\t->\t" << kv.second << "\n";
            }
        }
    }
}

Prints: 打印：

===========
OU=Sales + CN=J. Smith, O=Widget Inc., C=US
-----------
    CN  ->  J. Smith
    OU  ->  Sales 
-----------
    O   ->  Widget Inc.
-----------
    C   ->  US
===========
OU=#53616c6573
-----------
    OU  ->  Sales
===========
OU=Sa\+les + CN=J. Smi\%th, O=Wid\,\;get In\3bc., C=US
-----------
    CN  ->  J. Smi%th
    OU  ->  Sa+les 
-----------
    O   ->  Wid,;get In;c.
-----------
    C   ->  US

如何使用`boost :: spirit`将语法解析为`std :: set`？

问题描述

TL;DR TL; DR

Full problem statement 完整的问题陈述

Current code 目前的代码

1 个解决方案

解决方案1
2 已采纳 2015-10-28 15:33:53

Update 更新

Live Demo 现场演示

如何使用`boost :: spirit`将语法解析为`std :: set`？

问题描述

TL;DR TL; DR

Full problem statement 完整的问题陈述

Current code 目前的代码

1 个解决方案

解决方案1 2 已采纳 2015-10-28 15:33:53

Update 更新

Live Demo 现场演示

解决方案1
2 已采纳 2015-10-28 15:33:53