简体   繁体   English

为什么c ++字符串标记生成器不起作用

[英]Why a c++ string tokenizer isn't working

I tried to write a simple std::string tokenizer in c++, and I can't get it to work quite right. 我试图用C ++写一个简单的std :: string标记器,但我无法使其正常工作。 I found one online which does work, and I understand why it works.... but I still can't figure out why my original one isn't working. 我发现了一个网上这工作,我理解为什么它的作品....但我仍然无法弄清楚,为什么我原来的一个工作。 I'm assuming its some stupid little thing that I'm missing.... I'd appreciate a pointer in the right direction; 我假设它缺少了一些愚蠢的小东西。 thanks! 谢谢!

input (random character and symbols with "\\n" "\\t"): 输入(带有“ \\ n”,“ \\ t”的随机字符和符号):

"This is a test string;23248h> w chars, aNn, 8132; ai3v2< 8&G,\nnewline7iuf32\t2f,f3rgb, 43q\nefhfh\nu2hef, wew; wg"

tokenizer: 标记生成器:

size_t loc, prevLoc = 0;
while( (int)(loc = theStr.find_first_of("\n", prevLoc) ) > 0) {
    string subStr = theStr.substr(prevLoc, loc-1);        // -1 to skip the \n
    cout << "SUBSTR: '" << subStr << "'" << endl << endl;
    tokenizedStr->push_back( subStr );
    prevLoc = loc+1;
} // while

output: 输出:

SUBSTR: 'This is a test string;23248h> w chars, aNn, 8132; ai3v2< 8&G'

SUBSTR: 'newline7iuf32  2f,f3rgb, 43q
efhfh
u2hef, wew; wg'

SUBSTR: 'efhfh
u2hef, wew; wg'

Notice that the second "SUBSTR" (apparently) still has the newline characters ("\\n") in it 请注意,第二个“ SUBSTR”(显然)仍包含换行符(“ \\ n”)

Compilable code: 可编译的代码:

#include <vector.h>
#include <stdio.h>
#include <stdlib.h>
#include <string>

using namespace std;

int main(int argc, char *argv[]) {

    string testStr = "This is a test string;23248h> w chars, aNn, 8132; ai3v2< 8&G,\nnewline7iuf32\t2f,f3rgb, 43q\nefhfh\nu2hef, wew; wg";
    vector<string> tokenizedStr;

    size_t loc, prevLoc = 0;
    while( (int)(loc = testStr.find_first_of("\n", prevLoc) ) > 0) {
        string subStr = testStr.substr(prevLoc, loc-1);        // -1 to skip the \n                                                                                                     
        cout << "SUBSTR: '" << subStr << "'" << endl << endl;
        tokenizedStr.push_back( subStr );
        prevLoc = loc+1;
    } // while                                                                                                                                                                        

    return 0;
}

The second argument of substr is a size, not a location. substr的第二个参数是大小,而不是位置。 Instead of calling it like this: 而不是这样称呼它:

testStr.substr(prevLoc, loc-1);

Try this: 尝试这个:

testStr.substr(prevLoc, loc-prevLoc);

Once you fix that, the next problem you will run into is that you are not printing the last substring, because you are stopping once you don't find the newline. 一旦你解决了这个问题,你将遇到的下一个问题就是你没有打印最后一个子字符串,因为一旦你找不到换行符就停止了。 So from the point of the last newline to the end of the string doesn't get stored. 因此,从最后一个换行符到字符串末尾的位置都不会存储。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM