[英]Extracting tokens from string in C++
Edit: I'm looking for a solution that doesn't use regex since it seems buggy and not trustable编辑:我正在寻找一个不使用正则表达式的解决方案,因为它看起来有问题且不可信
I had the following function which extracts tokens of a string whenever one the following symbols is found: +,-,^,*,!
我有以下 function ,只要找到以下符号,它就会提取字符串的标记:
+,-,^,*,!
bool extract_tokens(string expression, std::vector<string> &tokens) {
static const std::regex reg(R"(\+|\^|-|\*|!|\(|\)|([\w|\s]+))");
std::copy(std::sregex_token_iterator(right_token.begin(), right_token.end(), reg, 0),
std::sregex_token_iterator(),
std::back_inserter(tokens));
return true;
}
I though it worked perfectly until today I found an edge case, The following input: !aaa + !我虽然它工作得很好,直到今天我发现了一个边缘案例,以下输入:!aaa + ! a is supposed to return
,,aaa,+,!, a
But it returns ,,aaa,+,"",!, a
Notice the extra empty string between + and.. a 应该返回
,,aaa,+,!, a
但它返回,,aaa,+,"",!, a
注意 + 和.. 之间的额外空字符串
How may I prevent this behaviour?我怎样才能防止这种行为? I think this can be done with the regex expression,
我认为这可以用正则表达式来完成,
Inspired by https://stackoverflow.com/a/9436872/4645334 you could solve the problem with:受https://stackoverflow.com/a/9436872/4645334启发,您可以通过以下方式解决问题:
bool extract_tokens(std::string expression, std::vector<std::string> &tokens) {
std::string token;
for (const auto& c: expression) {
if (c == '/' || c == '-' || c == '*' || c == '+' || c == '!') {
if (token.length() && !std::all_of(token.cbegin(), token.cend(), [](auto c) { return c == ' '; })) tokens.push_back(token);
token.clear();
tokens.emplace_back(1, c);
} else {
token += c;
}
}
if (token.length() && !std::all_of(token.cbegin(), token.cend(), [](auto c) { return c == ' '; })) tokens.push_back(token);
return true;
}
Input:输入:
"!aaa + ! a"
Output: Output:
"!","aaa ","+","!"," a"
In an attempt to salvage the regular expression-based solution, I came up with this:为了挽救基于正则表达式的解决方案,我想出了这个:
[-+^*!()]|\s*[^-+^*!()\s][^-+^*!()]*
Demo .演示。 This reports delimiters, and anything between delimiters including leading and trailing whitespace, but drops tokens consisting of whitespace alone.
这会报告分隔符,以及分隔符之间的任何内容,包括前导空格和尾随空格,但会丢弃仅由空格组成的标记。
A similar expression that also strips leading and trailing whitespace:一个类似的表达式也去除了前导和尾随空格:
[-+^*!()]|[^-+^*!()\s]+(\s+[^-+^*!()\s]+)*)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.