简体   繁体   中英

Splitting a string into multiple strings with multiple delimiters without removing?

I use boost framework, so it could be helpful, but I haven't found a necessary function.

For usual fast splitting I can use:

string str = ...;
vector<string> strs;
boost::split(strs, str, boost::is_any_of("mM"));

but it removes m and M characters.

I also can't siply use regexp because it searches the string for the longest value which meets a defined pattern.

PS There are a lot of similar questions, but they describe this implementation in other programming languages only.

Untested, but rather than using vector<string> , you could try a vector<boost::iterator_range<std::string::iterator>> (so you get a pair of iterators to the main string for each token. Then iterate from (start of range -1 [as long as start of range is not begin() of main string], to end of range)

EDIT: Here is an example:

#include <iostream>
#include <string>

#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/range/iterator_range.hpp>

int main(void)
{
  std::string str = "FooMBarMSFM";

  std::vector<boost::iterator_range<std::string::iterator>> tokens;

  boost::split(tokens, str, boost::is_any_of("mM"));

  for(auto r : tokens)
  {
    std::string b(r.begin(), r.end());
    std::cout << b << std::endl;
    if (r.begin() != str.begin())
    {
      std::string bm(std::prev(r.begin()), r.end());
      std::cout << "With token: [" << bm << "]" << std::endl;
    }
  }
}

Your need is beyond the conception of split . If you want to keep 'm or M', you could write a special split by strstr , strchr , strtok or find function. You could change some code to produce a flexible split function. Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void split(char *src, const char *separator, char **dest, int *num)
{
    char *pNext;
    int count = 0;

    if (src == NULL || strlen(src) == 0) return;
    if (separator == NULL || strlen(separator) == 0) return; 

    pNext = strtok(src,separator);

    while(pNext != NULL)
    {
        *dest++ = pNext;
        ++count;
        pNext = strtok(NULL,separator);
    }

    *num = count;
}

Besides, you could try boost::regex .

My current solution is the following (but it is not universal and looks like too complex).

I choose one character which couldn't appear in this string. In my case it is '|'.

string str = ...;
vector<string> strs;
boost::split(strs, str, boost::is_any_of("m"));
str = boost::join(strs, "|m");
boost::split(strs, str, boost::is_any_of("M"));
str = boost::join(strs, "|M");
if (boost::iequals(str.substr(0, 1), "|") {

    str = str.substr(1);
}
boost::split(strs, str, boost::is_any_of("|"));

I add "|" before each of symbols m/M, except of the very first position in string. Then I split the string into substrings with deleting of this extra character

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM