简体   繁体   English

C ++通过分隔符拆分字符串并保持结果中的分隔符

[英]C++ spliting string by delimiters and keeping the delimiters in result

I'm looking for a way to split string by multiple delimiters using regex in C++ but without losing the delimiters in output, keeping the delimiters with splitted parts in order, for example: 我正在寻找一种方法,使用C ++中的正则表达式分隔多个分隔符的字符串,但不会丢失输出中的分隔符,保持分隔符的顺序分隔符,例如:

Input 输入

aaa,bbb.ccc,ddd-eee; AAA,bbb.ccc,DDD-EEE;

Output 产量

aaa , bbb . aaa,bbb。 ccc , ddd - eee ; ccc,ddd - eee;

I've found some solutions for this but all in C# or java, looking for some C++ solution, preferably without using Boost. 我已经找到了一些解决方案,但都是在C#或java中,寻找一些C ++解决方案,最好不使用Boost。

You could build your solution on top of the example for regex_iterator . 您可以在regex_iterator的示例之上构建解决方案。 If, for example, you know your delimiters are comma, period, semicolon, and hyphen, you could use a regex that captures either a delimiter or a series of non-delimiters: 例如,如果您知道分隔符是逗号,句点,分号和连字符,则可以使用捕获分隔符或一系列非分隔符的正则表达式:

([.,;-]|[^.,;-]+)

Drop that into the sample code and you end up with something like this : 将其放入示例代码中,最终得到如下内容

#include <iostream>
#include <string>
#include <regex>

int main ()
{
  // the following two lines are edited; the remainder are directly from the reference.
  std::string s ("aaa,bbb.ccc,ddd-eee;");
  std::regex e ("([.,;-]|[^.,;-]+)");   // matches delimiters or consecutive non-delimiters

  std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
  std::regex_iterator<std::string::iterator> rend;

  while (rit!=rend) {
    std::cout << rit->str() << std::endl;
    ++rit;
  }

  return 0;
}

Try substituting in any other regular expressions you like. 尝试替换您喜欢的任何其他正则表达式。

For your case, splitting your input string according to the word boundary \\b except the one at the first will give you the desired output. 对于您的情况,根据单词boundary \\b拆分输入字符串,除了第一个输出字符串将为您提供所需的输出。

(?!^)\b

DEMO DEMO

OR 要么

(?<=\W)(?!$)|(?!^)(?=\W)

DEMO DEMO

  • (?<=\\W)(?!$) Matches the boundaries which exists next to a non-word character but not the boundary present at the last. (?<=\\W)(?!$)匹配非单词字符旁边的边界,但不匹配最后出现的边界。

  • | OR 要么

  • (?!^)(?=\\W) Matches the boundary which is followed by a non-word character except the one at the start. (?!^)(?=\\W)匹配除了开头的字符之外的非字字符的边界。

Escape the backslash one more time if necessary. 如有必要,再次逃避反斜杠。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM