简体   繁体   English

已解决 - 我做错了什么 strtok 在拆分字符串时做对了

[英]SOLVED-what am I doing wrong that strtok does right in splitting a string

Previous question was: what am I doing wrong that strtok does right in splitting a string.上一个问题是:strtok 在拆分字符串时做对了,我做错了什么。 Also separating the strtok to a function suddenly doesn't produce correct result?突然将 strtok 分离到 function 也不会产生正确的结果?

This is the first time that I ask a question in stackoverflow so forgive me if this is wordy and incoherent.这是我第一次在 stackoverflow 中提问,所以请原谅我是否冗长和不连贯。 The last part of the question is elaborated at the bottom part of this question body.问题的最后一部分在该问题正文的底部进行了详细说明。

So, I was doing a course assessment assigned by my college, in that, one question is:所以,我正在做我的大学分配的课程评估,其中一个问题是:

Remove duplicate words and print only unique words删除重复的单词并仅打印唯一的单词

Input: A single sentence in which each word separated by a space输入:单个句子,其中每个单词由空格分隔

Output: Unique words separated by a space [Order of words should be same as in input] Output:由空格分隔的唯一单词[单词顺序应与输入中的顺序相同]

Example:例子:

Input: adc aftrr adc art输入:adc aftrr adc art

Output: adc aftrr art Output:adc aftrr 艺术

Now, I have the solution which is to split the string on whitespaces and adding the word to a array(set) if it is not already exists, but it is the implementation part that makes me to plug my hair out现在,我有了解决方案,即在空格上拆分字符串并将单词添加到数组(集合)中(如果它尚不存在),但正是实现部分让我费尽心机

#include <stdio.h>
#include <string.h>

#define MAX 20

int exists(char words[][MAX], int n, char *word){ // The existence check function
    for(int i=0;i < n;i++)
        if(strcmp(words[i],word) == 0) 
            return 1;
    return 0;
}

void removeDuplicateOld(char*);
void removeDuplicateNew(char*);

int main(){
    char sentence[MAX*50] = {0}; //arbitary length
    fgets(sentence,MAX*50,stdin);
    sentence[strcspn(sentence,"\n")]=0;
    
    printf("The Old One : \n");
    removeDuplicateOld(sentence);
    printf("\nThe New One : \n");
    removeDuplicateNew(sentence);
}

The fucntion that uses strtok to split string:使用strtok分割字符串的函数:


void removeDuplicateNew(char *sentence){
    char words[10][MAX] = {0};
    int wi=0;
    char *token = strtok(sentence," ");
    
    while(token != NULL){
        if(exists(words,wi,token)==0) {
            strcpy(words[wi++],token);
        }
        token = strtok(NULL," ");
    }
    for(int i=0;i<wi;i++) printf("%s ",words[i]);
}

The old function that uses my old method (which is constructing a word until I hit whitespace):使用我的旧方法的旧 function(在我遇到空格之前构造一个词):


void removeDuplicateOld(char *sentence){
    char objects[10][MAX] = {0}; //10 words with MAX letters
    char tword[MAX];
    int oi=0, si=0, ti=0;
    
    while(sentence[si]!='\0'){
        if(sentence[si] != ' ' && sentence[si+1] != '\0')
            tword[ti++] = sentence[si];
        else{
            if(sentence[si+1] == '\0')
                tword[ti++]=sentence[si];
                
            tword[ti]='\0';
            
            if(exists(objects,oi,tword) == 0){
                strcpy(objects[oi++],tword);
            }
            ti=0; // The buffer[tword] is made to be overwritten

        }
        si++;
    }
    for(int i=0;i<oi;i++)
        printf("%s ",objects[i]);
}

Solved: changed if(sentence[si+1] == '\0') to if(sentence[si+1] == '\0' && sentence[si]!=' ')已解决:将 if(sentence[si+1] == '\0') 更改为 if(sentence[si+1] == '\0' && sentence[si]!=' ')


Here is the output:这是 output:

input: abc def ghi abc jkl ghi输入:abc def ghi abc jkl ghi

The Old One:旧一:

abc def ghi jkl abc def ghi jkl

The New One:新的那一个:

abc def ghi jkl abc def ghi jkl

Note trailing whitespaces in input and output is not checked as their own driver code doesn't properly handle them while strtok method does and it passes all tests.请注意,输入中的尾随空格和 output 未被检查,因为它们自己的驱动程序代码未正确处理它们,而 strtok 方法可以并通过所有测试。


Now both methods seems to be producing same results but they are indeed producing different outputs according to test cases and in top of that separating strtok method as a separate function[removeDuplicateNew] fails one test case while writing it in main method itself passes all test, see these results:现在这两种方法似乎产生了相同的结果,但它们确实根据测试用例产生了不同的输出,并且在将 strtok 方法作为一个单独的函数 [removeDuplicateNew] 的顶部,它在一个测试用例中失败,而在 main 方法中编写它本身通过了所有测试,看到这些结果:

Old Method Test Case results旧方法测试用例结果

Strtok Method as Separate Function Test Case Results Strtok 方法作为单独的 Function 测试用例结果

Following Was Moved To A separate Question Thread以下已移至单独的问题线程

When Coded in main method itself:在 main 方法本身中编码时:

int main(){
    char sentence[MAX*50] = {0}; //arbitary length
    fgets(sentence,MAX*50,stdin);
    sentence[strcspn(sentence,"\n")] = 0;
    
    char words[10][MAX] = {0};
    int wi=0;
    char *token = strtok(sentence," ");
    
    while(token != NULL){
        if(exists(words,wi,token)==0) {
            strcpy(words[wi++],token);
        }
        token = strtok(NULL," ");
    }
    for(int i=0;i<wi;i++) printf("%s ",words[i]);
}

Strtok Method as inline code Test Case Results Strtok 方法作为内联代码测试用例结果

For the record, it is the same code just placed in main method, so what the heck happens here that when I separate it as a function and pass the string as argument it suddenly isn't working properly.作为记录,它是放在 main 方法中的相同代码,所以这里到底发生了什么,当我将它分离为 function 并将字符串作为参数传递时,它突然无法正常工作。

Also any advice on my question building, wording is appreciated.还有关于我的问题构建的任何建议,措辞表示赞赏。

Your code...你的代码...

void removeDuplicateOld(char *sentence){
    char objects[10][MAX] = {0}; //10 words with MAX letters
    char tword[MAX];
    int oi=0, si=0, ti=0;
    
    while(sentence[si]!='\0'){
        if(sentence[si] != ' ' && sentence[si+1] != '\0')
            tword[ti++] = sentence[si];
        else{
            // right here have hit SP.
            // if SP followed by '\0'
            // then append SP to my word... wrong! <=====
            if(sentence[si+1] == '\0')
                tword[ti++]=sentence[si];
                
            tword[ti]='\0';

This is why the library function strtok() works better than hand rolled code.这就是库 function strtok()hand rolled代码更好用的原因。
It has been tested and proven to work as it says it does.它已经过测试并证明可以像它所说的那样工作。


There's a better way to use strtok()有更好的方法来使用strtok()

for( char *p = sentence; (p = strtok( p, " \n") ) != NULL; p = NULL )
    if( exists( words, wi, p ) == 0 )
        strcpy( words[wi++], p );

That's all you need.这就是你所需要的。 strtok() will even trim the LF off the buffer for you, no extra charge. strtok() 甚至会为您修剪缓冲区中的 LF,无需额外付费。


Final suggestion: Instead of a fixed-sized array of pointers to words, you might consider a linked-list (LL) that can easily grow.最后的建议:您可以考虑可以轻松增长的链表 (LL),而不是固定大小的单词指针数组。 The function that would append a new word to the end of the list can quietly eat the word if it turns out to be a duplicate found while traversing to append to the end of the LL. function 将 append 一个新词到列表的末尾,如果在遍历到 append 到 LL 的末尾时发现它是重复的,则可以悄悄地吃掉这个词。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM