[英]Tokenize a “Braced Initializer List”-Style String in C++ (With Boost?)
I have a string (nested strings even) that are formatted like a C++ braced initializer list. 我有一个字符串(甚至嵌套字符串),其格式像C ++大括号的初始化程序列表一样。 I want to tokenize them one level at a time into a vector of strings.
我想一次将它们标记成字符串向量。
So when I input "{one, two, three}"
to the function should output a three element vector 因此,当我向函数输入
"{one, two, three}"
时,应输出一个三元素向量
"one"
, "one"
"two"
, "two"
"three"
To complicate this, it needs to support quoted tokens and preserve nested lists: 要使其复杂化,它需要支持带引号的标记并保留嵌套列表:
Input String: "{one, {2, \\"three four\\"}}, \\"five, six\\", {\\"seven, eight\\"}}"
输入字符串:
"{one, {2, \\"three four\\"}}, \\"five, six\\", {\\"seven, eight\\"}}"
Output is a four element vector: 输出是一个四元素向量:
"one"
, "one"
"{2, \\"three four\\"}"
, "{2, \\"three four\\"}"
,
"five, six"
, "five, six"
,
"{\\"seven, eight\\"}"
I've looked at a few other SO posts: 我看了其他一些SO帖子:
Using Boost Tokenizer escaped_list_separator with different parameters 使用带有不同参数的Boost Tokenizer escaped_list_separator
Boost split not traversing inside of parenthesis or braces 助推拆分不遍历括号或大括号内
And used those to start a solution, but this seems slightly too complicated for the tokenizer (because of the braces): 并使用它们来启动解决方案,但这对令牌生成器来说似乎有点太复杂了(由于括号):
#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>
std::vector<std::string> TokenizeBracedList(const std::string& x)
{
std::vector<std::string> tokens;
std::string separator1("");
std::string separator2(",\n\t\r");
std::string separator3("\"\'");
boost::escaped_list_separator<char> elements(separator1, separator2, separator3);
boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(x, elements);
for(auto i = std::begin(tokenizer); i != std::end(tokenizer); ++i)
{
auto token = *i;
boost::algorithm::trim(token);
tokens.push_back(token);
}
return tokens;
}
With this, even in the trivial case, it doesn't strip the opening and closing braces. 这样,即使在平凡的情况下,也不会剥去开合括号。
Boost and C++17 are fair game for a solution. Boost和C ++ 17是解决方案的公平竞赛。
Defining a flat data structure like: 定义一个平面数据结构,例如:
using token = std::string;
using tokens = std::vector<token>;
We can define an X3 parser like: 我们可以定义一个X3解析器,例如:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
using token = std::string;
using tokens = std::vector<token>;
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
int main() {
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
std::vector<std::string> parsed;
bool ok = phrase_parse(f, l, Parser::list, x3::space, parsed);
if (ok) {
std::cout << "Parsed: " << parsed.size() << " elements\n";
for (auto& el : parsed) {
std::cout << " - " << std::quoted(el, '\'') << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints 打印
Parsed: 2 elements
- 'one'
- 'five, six'
Parsed: 4 elements
- 'one'
- '{2, "three four"}'
- 'five, six'
- '{"seven, eight"}'
Changing the datastructure to be a bit more specific/realistic: 将数据结构更改为更具体/更实际:
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
Now we can change the grammar, as we no longer need to treat sublist
as if it is a string: 现在我们可以更改语法,因为我们不再需要将
sublist
视为字符串:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Everything "still works": Live On Wandbox 一切“仍然有效”: 在魔盒上直播
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
struct pretty_printer {
using result_type = void;
std::ostream& _os;
int _indent;
pretty_printer(std::ostream& os, int indent = 0) : _os(os), _indent(indent) {}
void operator()(ast::value const& v) { boost::apply_visitor(*this, v); }
void operator()(double v) { _os << v; }
void operator()(std::string s) { _os << std::quoted(s); }
void operator()(ast::list const& l) {
_os << "{\n";
_indent += 2;
for (auto& item : l) {
_os << std::setw(_indent) << "";
operator()(item);
_os << ",\n";
}
_indent -= 2;
_os << std::setw(_indent) << "" << "}";
}
};
int main() {
pretty_printer print{std::cout};
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
ast::value parsed;
bool ok = phrase_parse(f, l, Parser::item, x3::space, parsed);
if (ok) {
std::cout << "Parsed: ";
print(parsed);
std::cout << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints: 打印:
Parsed: {
"one",
"five, six",
}
Parsed: {
"one",
{
2,
"three four",
},
"five, six",
{
"seven, eight",
},
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.