标记化C ++中的“带括号的初始化程序列表”样式的字符串（使用Boost？）

Question

I have a string (nested strings even) that are formatted like a C++ braced initializer list. 我有一个字符串（甚至嵌套字符串），其格式像C ++大括号的初始化程序列表一样。 I want to tokenize them one level at a time into a vector of strings. 我想一次将它们标记成字符串向量。

So when I input "{one, two, three}" to the function should output a three element vector 因此，当我向函数输入"{one, two, three}"时，应输出一个三元素向量

"one" , "one"

"two" , "two"

"three"

To complicate this, it needs to support quoted tokens and preserve nested lists: 要使其复杂化，它需要支持带引号的标记并保留嵌套列表：

Input String: "{one, {2, \\"three four\\"}}, \\"five, six\\", {\\"seven, eight\\"}}" 输入字符串： "{one, {2, \\"three four\\"}}, \\"five, six\\", {\\"seven, eight\\"}}"

Output is a four element vector: 输出是一个四元素向量：

"one" , "one"

"{2, \\"three four\\"}" , "{2, \\"three four\\"}" ，

"five, six" , "five, six" ，

"{\\"seven, eight\\"}"

I've looked at a few other SO posts: 我看了其他一些SO帖子：

Using Boost Tokenizer escaped_list_separator with different parameters 使用带有不同参数的Boost Tokenizer escaped_list_separator

Boost split not traversing inside of parenthesis or braces 助推拆分不遍历括号或大括号内

And used those to start a solution, but this seems slightly too complicated for the tokenizer (because of the braces): 并使用它们来启动解决方案，但这对令牌生成器来说似乎有点太复杂了（由于括号）：

#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>

std::vector<std::string> TokenizeBracedList(const std::string& x)
{
  std::vector<std::string> tokens;

  std::string separator1("");
  std::string separator2(",\n\t\r");
  std::string separator3("\"\'");

  boost::escaped_list_separator<char> elements(separator1, separator2, separator3);
  boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(x, elements);

  for(auto i = std::begin(tokenizer); i != std::end(tokenizer); ++i)
  {
    auto token = *i;
    boost::algorithm::trim(token);
    tokens.push_back(token);
  }

  return tokens;
}

With this, even in the trivial case, it doesn't strip the opening and closing braces. 这样，即使在平凡的情况下，也不会剥去开合括号。

Boost and C++17 are fair game for a solution. Boost和C ++ 17是解决方案的公平竞赛。

Answer 1

Simple (Flat) Take 简单（平放）

Defining a flat data structure like: 定义一个平面数据结构，例如：

using token  = std::string;
using tokens = std::vector<token>;

We can define an X3 parser like: 我们可以定义一个X3解析器，例如：

namespace Parser {
    using namespace boost::spirit::x3;

    rule<struct list_, token> item;

    auto quoted   = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
    auto bare     = lexeme [ +(graph-','-'}') ];

    auto list     = '{' >> (item % ',') >> '}';
    auto sublist  = raw [ list ];

    auto item_def = sublist | quoted | bare;

    BOOST_SPIRIT_DEFINE(item)
}

Live On Wandbox 魔盒直播

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

using token  = std::string;
using tokens = std::vector<token>;

namespace x3 = boost::spirit::x3;

namespace Parser {
    using namespace boost::spirit::x3;

    rule<struct list_, token> item;

    auto quoted   = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
    auto bare     = lexeme [ +(graph-','-'}') ];

    auto list     = '{' >> (item % ',') >> '}';
    auto sublist  = raw [ list ];

    auto item_def = sublist | quoted | bare;

    BOOST_SPIRIT_DEFINE(item)
}

int main() {
    for (std::string const input : {
            R"({one, "five, six"})",
            R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
        })
    {
        auto f = input.begin(), l = input.end();

        std::vector<std::string> parsed;
        bool ok = phrase_parse(f, l, Parser::list, x3::space, parsed);

        if (ok) {
            std::cout << "Parsed: " << parsed.size() << " elements\n";
            for (auto& el : parsed) {
                std::cout << " - " << std::quoted(el, '\'') << "\n";
            }
        } else {
            std::cout << "Parse failed\n";
        }

        if (f != l)
            std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
    }
}

Prints 打印

Parsed: 2 elements
 - 'one'
 - 'five, six'
Parsed: 4 elements
 - 'one'
 - '{2, "three four"}'
 - 'five, six'
 - '{"seven, eight"}'

Nested Data 嵌套数据

Changing the datastructure to be a bit more specific/realistic: 将数据结构更改为更具体/更实际：

namespace ast {
    using value = boost::make_recursive_variant<
            double,
            std::string,
            std::vector<boost::recursive_variant_>
        >::type;
    using list = std::vector<value>;
}

Now we can change the grammar, as we no longer need to treat sublist as if it is a string: 现在我们可以更改语法，因为我们不再需要将sublist视为字符串：

namespace Parser {
    using namespace boost::spirit::x3;

    rule<struct item_, ast::value> item;

    auto quoted   = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
    auto bare     = lexeme [ +(graph-','-'}') ];

    auto list     = x3::rule<struct list_, ast::list> {"list" }
                  = '{' >> (item % ',') >> '}';

    auto item_def = list | double_ | quoted | bare;

    BOOST_SPIRIT_DEFINE(item)
}

Everything "still works": Live On Wandbox 一切“仍然有效”： 在魔盒上直播

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

namespace ast {
    using value = boost::make_recursive_variant<
            double,
            std::string,
            std::vector<boost::recursive_variant_>
        >::type;
    using list = std::vector<value>;
}

namespace x3 = boost::spirit::x3;

namespace Parser {
    using namespace boost::spirit::x3;

    rule<struct item_, ast::value> item;

    auto quoted   = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
    auto bare     = lexeme [ +(graph-','-'}') ];

    auto list     = x3::rule<struct list_, ast::list> {"list" }
                  = '{' >> (item % ',') >> '}';

    auto item_def = list | double_ | quoted | bare;

    BOOST_SPIRIT_DEFINE(item)
}

struct pretty_printer {
    using result_type = void;
    std::ostream& _os;
    int _indent;

    pretty_printer(std::ostream& os, int indent = 0) : _os(os), _indent(indent) {}

    void operator()(ast::value const& v) { boost::apply_visitor(*this, v); }

    void operator()(double v)            { _os << v; }
    void operator()(std::string s)       { _os << std::quoted(s); }
    void operator()(ast::list const& l)  {
        _os << "{\n";
        _indent += 2;
        for (auto& item : l) {
            _os << std::setw(_indent) << "";
            operator()(item);
           _os << ",\n";
        }
        _indent -= 2;
        _os << std::setw(_indent) << "" << "}";
    }
};

int main() {
    pretty_printer print{std::cout};

    for (std::string const input : {
            R"({one, "five, six"})",
            R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
        })
    {
        auto f = input.begin(), l = input.end();

        ast::value parsed;
        bool ok = phrase_parse(f, l, Parser::item, x3::space, parsed);

        if (ok) {
            std::cout << "Parsed: ";
            print(parsed);
            std::cout << "\n";
        } else {
            std::cout << "Parse failed\n";
        }

        if (f != l)
            std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
    }
}

Prints: 打印：

Parsed: {
  "one",
  "five, six",
}
Parsed: {
  "one",
  {
    2,
    "three four",
  },
  "five, six",
  {
    "seven, eight",
  },
}

标记化C ++中的“带括号的初始化程序列表”样式的字符串（使用Boost？）

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-04-09 16:11:48

Simple (Flat) Take 简单（平放）

Nested Data 嵌套数据

标记化C ++中的“带括号的初始化程序列表”样式的字符串（使用Boost？）

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-04-09 16:11:48

Simple (Flat) Take 简单（平放）

Nested Data 嵌套数据

解决方案1
3 已采纳 2018-04-09 16:11:48