[英]tokenizing string , accepting everything between given set of characters in CPP
I have the following code: 我有以下代码:
int main()
{
string s = "server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')";
regex re("(\'[!-~]+\')");
sregex_token_iterator i(s.begin(), s.end(), re, 1);
sregex_token_iterator j;
unsigned count = 0;
while(i != j)
{
cout << "the token is "<<*i<< endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
return 0;
}
Using the above regex, I wanted to extract the string between the paranthesis and single quote:, The out put should look like : 使用上面的正则表达式,我想提取括号和单引号之间的字符串:,输出应如下所示:
the token is 'm1.labs.teradata.com'
the token is 'use\')r_*5'
the token is 'u" er 5'
the token is 'default'
There were 4 tokens found.
Basically, the regex supposed to extract everything between " (' " and " ') ". 基本上,正则表达式应该提取“('”和“''')之间的所有内容。 It can be anything space , special character, quote or a closing parathesis. 它可以是任何空格,特殊字符,引号或结束语。 I has earlier used the following regex: 我之前使用过以下正则表达式:
boost::regex re_arg_values("(\'[!-~]+\')");
But is was not accepting space. 但不是在接受空间。 Please can someone help me out with this. 请有人帮我解决这个问题。 Thanks in advance. 提前致谢。
Here's a sample of using Spirit X3 to create grammar to actually parse this. 这是使用Spirit X3创建语法以实际解析此语法的示例。 I'd like to parse into a map of (key->value) pairs, which makes a lot more sense than just blindly assuming the names are always the same: 我想解析成一个(key-> value)对的映射,这比盲目地假设名称始终相同要有意义得多:
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
Now, we setup some grammar rules using X3: 现在,我们使用X3设置一些语法规则:
namespace parser {
using namespace boost::spirit::x3;
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
The helpers as<>
and quoted
are simple lambdas: 辅助函数as<>
和quoted
是简单的lambda:
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *('\\' >> char_ | char_ - q) >> q]; };
Now we can parse the string into a map directly: 现在我们可以将字符串直接解析为映射:
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
And the demo program 和演示程序
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Prints 版画
Key dbname has value default
Key password has value u" er 5
Key server has value m1.labs.teradata.com
Key username has value use')r_*5
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
namespace parser {
using namespace boost::spirit::x3;
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *(('\\' >> char_) | (char_ - q)) >> q]; };
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
If you want to learn how to extract the raw input: just try 如果您想学习如何提取原始输入,请尝试
auto source = skip(space) [ *raw [ pair ] ];
as in this: 像这样:
using RawSettings = std::vector<std::string>;
RawSettings parse_raw_config(std::string const& cfg) {
RawSettings parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::source, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
for (auto& setting : parse_raw_config(text))
std::cout << "Raw: " << setting << "\n";
}
Which prints: Live On Coliru 哪些印刷品: Live on Coliru
Raw: server ('m1.labs.teradata.com')
Raw: username ('use\')r_*5')
Raw: password('u" er 5')
Raw: dbname ('default')
Fixing a few syntax and style issues: 解决了一些语法和样式问题:
\\
in C strings 您需要在C字符串中转义\\
"
in s, making a syntax error 您输入了"
in "
,语法错误 #include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
int main() {
std::string s = "server ('m1.labs.teradata.com') username ('use\')r_*5') password('u' er 5') dbname ('default')";
boost::regex re(R"(('([^'\\]*(?:\\[\s\S][^'\\]*)*)'))");
size_t count = 0;
for (auto tok : boost::make_iterator_range(boost::sregex_token_iterator(s.begin(), s.end(), re, 1), {})) {
std::cout << "Token " << ++count << " is " << tok << "\n";
}
}
Prints 版画
Token 1 is 'm1.labs.teradata.com'
Token 2 is 'use'
Token 3 is ') password('
Token 4 is ' er 5'
Token 5 is 'default'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.