简体   繁体   中英

Split text using strtok without deleting the delim c++

How can I split a text into tokens using strtok without deleting the delim? I want just to split at their place.

You can't. strtok does the splitting by replacing the delimiter with a '\\0'. Without doing that, no splitting would happen.

You could, however, create a function that did splitting kind of like strtok does, but by finding where the string should be split and (for example) allocating storage and copying the characters up to the delimiter into that storage. strcspn or strpbrk would probably be a useful start at this.

Can you use boost? boost::algorithm::split does exactly what you want.

You can, of course, write one yourself; it's not like split is complicated: (Note: I have not actualy tested this)

std::wstring source(L"Test\nString");
std::vector<std::wstring> result;
std::wstring::iterator start, end;
start = source.begin();
end = std::find(source.begin(), source.end(), L'\n');
for(; end != source.end(); start = end, end = std::find(end, source.end(), L'\n'))
    result.push_back(std::wstring(start, end));
result.push_back(std::wstring(start, end));

Simple don't use strtok.

Use the C++ stream operator.
The getline() function can be used with an extra parameter that defines the end of line token.

#include <string>
#include <sstream>
#include <vector>

int main()
{
    std::string         text("This is text; split by; the semicolon; that we will split into bits.");
    std::stringstream   textstr(text);

    std::string               line;
    std::vector<std::string>  data;
    while(std::getline(textstr,line,';'))
    {
        data.push_back(line);
    }
}

With a tiny bit more work we can even get the STL algorithms to pay their part we just need to define how a token is streamed. To do this just define a token class (or struct) then define the operator>> that reads up to the token separator.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <iostream>

struct Token
{
    std::string data;
    operator std::string() const { return data;}
};
std::istream& operator>>(std::istream& stream,Token& data)
{
    return std::getline(stream,data.data,';');
}

int main()
{
    std::string         text("This is text; split by; the semicolon; that we will split into bits.");
    std::stringstream   textstr(text);

    std::vector<std::string>  data;

    // This statement does the work of the loop from the last example.
    std::copy(std::istream_iterator<Token>(textstr),
              std::istream_iterator<Token>(),
              std::back_inserter(data)
             );

    // This just prints out the vector to the std::cout just to illustrate it worked.
    std::copy(data.begin(),data.end(),std::ostream_iterator<std::string>(std::cout,"\n"));
}

You can't. The behavior of strtok is that it replaces the delimiter with a NUL character. This behavior is not configurable. To return each substring, including the delimiter, you will have to find a function other than strtok , or else combine strtok with some of your own processing.

如果您的libc实现有它,请看看strsep(3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM