简体   繁体   中英

C++ split string with space and punctuation chars

I wanna split an string using C++ which contains spaces and punctuations.

eg str = "This is a dog; A very good one."

I wanna get "This" "is" "a" "dog" "A" "very" "good" "one" 1 by 1.

It's quite easy with only one delimiter using getline but I don't know all the delimiters. It can be any punctuation chars.

Note: I don't wanna use Boost!

So, starting at the first position, you find the first valid token. You can use

index = str.find_first_not_of (yourDelimiters);

Then you have to find the first delimiter after this, so you can do

delimIndex = str.substr (index).find_first_of (yourDelimiters);

your first word will then be

// since delimIndex will essentially be the length of the word
word = str.substr (index, delimIndex);

Then you truncate your string and repeat. You have to, of course, handle all of the cases where find_first_not_of and find_first_of return npos, which means that character was/was not found, but I think that's enough to get started.

Btw, I'm not claiming that this is the best method, but it works...

Use std::find_if() with a lambda to find the delimiter.

auto it = std::find_if(str.begin(), str.end(), [] (const char element) -> bool {
                       return std::isspace(element) || std::ispunct(element);})

CPP, unlike JAVA doesn't provide an elegant way to split the string by a delimiter. You can use boost library for the same but if you want to avoid it, a manual logic would suffice.

vector<string> split(string s) {
    
    vector<string> words;
    string word = ""; 
    
    for(char x: s) {
        if(x == ' ' or x == ',' or x == '?' or x == ';' or x == '!'
           or x == '.') {
            if(word.length() > 0) {
                words.push_back(word);
                word = "";
            }
        }
        else
            word.push_back(x);
    }
    if(word.length() > 0) {
        words.push_back(word);
    }
    return words;

vmpstr's solution works, but could be a bit tedious. Some months ago, I wrote a C library that does what you want. http://wiki.gosub100.com/doku.php?id=librerias:c:cadenas

Documentation has been written in Spanish (sorry).

It doesn't need external dependencies. Try with splitWithChar() function.

Example of use:

#include "string_functions.h"
int main(void){

    char yourString[]= "This is a dog; A very good one.";
    char* elementsArray[8];
    int nElements;
    int i;

    /*------------------------------------------------------------*/
    printf("Character split test:\n");
    printf("Base String: %s\n",yourString);

    nElements = splitWithChar(yourString, ' ', elementsArray);

    printf("Found %d element.\n", nElements);

    for (i=0;i<nElements;i++){
        printf ("Element %d: %s\n", i, elementsArray[i]);
    }

    return 0;
}

The original string "yourString" is modified after use spliWithChar(), so be carefull.

Good luck :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM