简体   繁体   English

C-使用strtok的嵌套循环

[英]C - Nested loop using strtok

I am trying to use strtok to split up a text file into strings that I can pass to a spell check function, the text file includes characters such as '\\n', ' ?!,.' 我正在尝试使用strtok将文本文件拆分为可以传递给拼写检查功能的字符串,该文本文件包含诸如'\\ n','?!等字符。 etc... I need to print any words that fail the spell check and the line number that they are on. 等等...我需要打印所有未通过拼写检查的单词以及它们所在的行号。 Keeping track of the line is what I'm struggling with. 我一直在努力跟踪生产线。 I have tried this so far but it only returns results for the first line of the text file: 到目前为止,我已经尝试过了,但是它只返回文本文件第一行的结果:

char str[409377];
fread(str, noOfChars, 1, file);
fclose(file);

int lines=1;
char *token;
char *line;
char splitLine[] = "\n";
char delimiters[] = " ,.?!(){}*&^%$£_-+=";
line = strtok(str, splitLine);
while(line!=NULL){
    token = strtok(line, delimiters);
    while(token != NULL){
        //print is just to test if I can loop through all the words
        printf("%s", token);
        //spellCheck function & logic here
        token = strtok(NULL, delimiters);
    }
    line = strtok(NULL, splitLine);
    lines++
}

Is using the nested while loop and strtok possible? 是否可以使用嵌套的while循环和strtok? Is there a better way to keep track of the line number? 有更好的方法来跟踪行号吗?

The strtok function is not reentrant ! strtok函数不可重入 It can not be used to tokenize multiple strings simultaneously. 它不能用于同时标记多个字符串。 It's because it keeps internal state about the string currently being tokenized. 这是因为它保持有关当前正在标记的字符串的内部状态。

If you have a modern compiler and standard library then you could use strtok_s instead. 如果您拥有现代的编译器和标准库,则可以改用strtok_s Otherwise you have to come up with another solution. 否则,您必须提出另一种解决方案。

You can use strtok, but it's not very easy to use. 您可以使用strtok,但是使用起来并不容易。 It's a stupid function, all it really does is replace delimiters with nuls and return a pointer to the start of the sequence it has delimited. 这是一个愚蠢的函数,它真正要做的就是用nul替换定界符,并返回一个指针,该指针指向已定界的序列的开头。 So it's destructive. 因此具有破坏性。 It can't handle special cases like English words being allowed one apostrophe (we're is a word, we'r'e is not), you have to make sure you list all the delimiters specifically. 它不能处理特殊情况,例如英语单词被允许一个撇号(我们是一个单词,我们不是),您必须确保列出所有分隔符。

It's probably best to write mystrok yourself, so you understand how it works. 最好自己编写mystrok,这样您才能了解它的工作原理。 Then use that as the basis for your own word extractor. 然后将其用作您自己的单词提取器的基础。

The reason for your bug is that you chop off the first line, then that is all that strok sees on the subsequent calls. 造成该错误的原因是,您砍掉了第一行,然后这就是在随后的调用中strok看到的全部内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM