C ++通过分隔符拆分字符串并保持结果中的分隔符

Question

I'm looking for a way to split string by multiple delimiters using regex in C++ but without losing the delimiters in output, keeping the delimiters with splitted parts in order, for example: 我正在寻找一种方法，使用C ++中的正则表达式分隔多个分隔符的字符串，但不会丢失输出中的分隔符，保持分隔符的顺序分隔符，例如：

Input 输入

aaa,bbb.ccc,ddd-eee; AAA，bbb.ccc，DDD-EEE;

Output 产量

aaa , bbb . aaa，bbb。 ccc , ddd - eee ; ccc，ddd - eee;

I've found some solutions for this but all in C# or java, looking for some C++ solution, preferably without using Boost. 我已经找到了一些解决方案，但都是在C＃或java中，寻找一些C ++解决方案，最好不使用Boost。

Answer 1

You could build your solution on top of the example for regex_iterator . 您可以在regex_iterator的示例之上构建解决方案。 If, for example, you know your delimiters are comma, period, semicolon, and hyphen, you could use a regex that captures either a delimiter or a series of non-delimiters: 例如，如果您知道分隔符是逗号，句点，分号和连字符，则可以使用捕获分隔符或一系列非分隔符的正则表达式：

([.,;-]|[^.,;-]+)

Drop that into the sample code and you end up with something like this : 将其放入示例代码中，最终得到如下内容：

#include <iostream>
#include <string>
#include <regex>

int main ()
{
  // the following two lines are edited; the remainder are directly from the reference.
  std::string s ("aaa,bbb.ccc,ddd-eee;");
  std::regex e ("([.,;-]|[^.,;-]+)");   // matches delimiters or consecutive non-delimiters

  std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
  std::regex_iterator<std::string::iterator> rend;

  while (rit!=rend) {
    std::cout << rit->str() << std::endl;
    ++rit;
  }

  return 0;
}

Try substituting in any other regular expressions you like. 尝试替换您喜欢的任何其他正则表达式。

Answer 2

For your case, splitting your input string according to the word boundary \\b except the one at the first will give you the desired output. 对于您的情况，根据单词boundary \\b拆分输入字符串，除了第一个输出字符串将为您提供所需的输出。

(?!^)\b

DEMO DEMO

OR 要么

(?<=\W)(?!$)|(?!^)(?=\W)

DEMO DEMO

(?<=\\W)(?!$) Matches the boundaries which exists next to a non-word character but not the boundary present at the last. (?<=\\W)(?!$)匹配非单词字符旁边的边界，但不匹配最后出现的边界。
| OR 要么
(?!^)(?=\\W) Matches the boundary which is followed by a non-word character except the one at the start. (?!^)(?=\\W)匹配除了开头的字符之外的非字字符的边界。

Escape the backslash one more time if necessary. 如有必要，再次逃避反斜杠。

C ++通过分隔符拆分字符串并保持结果中的分隔符

问题描述

2 个解决方案

解决方案1
10 已采纳 2014-12-30 14:09:56

解决方案2
3 2014-12-30 13:34:43

C ++通过分隔符拆分字符串并保持结果中的分隔符

问题描述

2 个解决方案

解决方案1 10 已采纳 2014-12-30 14:09:56

解决方案2 3 2014-12-30 13:34:43

解决方案1
10 已采纳 2014-12-30 14:09:56

解决方案2
3 2014-12-30 13:34:43