从 C++ 中的字符串中提取标记

Question

Edit: I'm looking for a solution that doesn't use regex since it seems buggy and not trustable编辑：我正在寻找一个不使用正则表达式的解决方案，因为它看起来有问题且不可信

I had the following function which extracts tokens of a string whenever one the following symbols is found: +,-,^,*,!我有以下 function ，只要找到以下符号，它就会提取字符串的标记： +,-,^,*,!

bool extract_tokens(string expression, std::vector<string> &tokens) {    
    static const std::regex reg(R"(\+|\^|-|\*|!|\(|\)|([\w|\s]+))");
    std::copy(std::sregex_token_iterator(right_token.begin(), right_token.end(), reg, 0),
              std::sregex_token_iterator(),
              std::back_inserter(tokens));
    return true;
}

I though it worked perfectly until today I found an edge case, The following input: !aaa + !我虽然它工作得很好，直到今天我发现了一个边缘案例，以下输入：！aaa + ！ a is supposed to return ,,aaa,+,!, a But it returns ,,aaa,+,"",!, a Notice the extra empty string between + and.. a 应该返回,,aaa,+,!, a但它返回,,aaa,+,"",!, a注意 + 和.. 之间的额外空字符串

How may I prevent this behaviour?我怎样才能防止这种行为？ I think this can be done with the regex expression,我认为这可以用正则表达式来完成，

Answer 1

Inspired by https://stackoverflow.com/a/9436872/4645334 you could solve the problem with:受https://stackoverflow.com/a/9436872/4645334启发，您可以通过以下方式解决问题：

bool extract_tokens(std::string expression, std::vector<std::string> &tokens) {
  std::string token;

  for (const auto& c: expression) {
    if (c == '/' || c == '-' || c == '*' || c == '+' || c == '!') {
      if (token.length() && !std::all_of(token.cbegin(), token.cend(), [](auto c) { return c == ' '; })) tokens.push_back(token);
      token.clear();
      tokens.emplace_back(1, c);
    } else {
      token += c;
    }
  }

  if (token.length() && !std::all_of(token.cbegin(), token.cend(), [](auto c) { return c == ' '; })) tokens.push_back(token);
     
  return true;
}

Input:输入：

"!aaa + ! a"

Output: Output：

"!","aaa ","+","!"," a"

Answer 2

In an attempt to salvage the regular expression-based solution, I came up with this:为了挽救基于正则表达式的解决方案，我想出了这个：

[-+^*!()]|\s*[^-+^*!()\s][^-+^*!()]*

Demo .演示。 This reports delimiters, and anything between delimiters including leading and trailing whitespace, but drops tokens consisting of whitespace alone.这会报告分隔符，以及分隔符之间的任何内容，包括前导空格和尾随空格，但会丢弃仅由空格组成的标记。

A similar expression that also strips leading and trailing whitespace:一个类似的表达式也去除了前导和尾随空格：

[-+^*!()]|[^-+^*!()\s]+(\s+[^-+^*!()\s]+)*)

Demo演示

从 C++ 中的字符串中提取标记

问题描述

2 个解决方案

解决方案1
0 2020-08-08 20:53:48

解决方案2
0 已采纳 2020-08-08 21:16:38

从 C++ 中的字符串中提取标记

问题描述

2 个解决方案

解决方案1 0 2020-08-08 20:53:48

解决方案2 0 已采纳 2020-08-08 21:16:38

解决方案1
0 2020-08-08 20:53:48

解决方案2
0 已采纳 2020-08-08 21:16:38