简体   繁体   中英

different string lengths using strtok

void redact_words(const char *text_filename, const char *redact_words_filename){
    FILE *fp = fopen(text_filename,"r");
    FILE *f2p = fopen(redact_words_filename,"r");
    FILE *f3p = fopen("result.txt", "w"); ;
    char buffer1[1000];
    char buffer2[1000];
    char *word;
    char *redact;
    

    
    char **the_words;
    
    //if ((fgets(buffer1, 1000 ,fp) == NULL) || (fgets(buffer2,1000 ,f2p) == NULL))
    
    fgets(buffer1,1000,fp);
    fgets(buffer2,1000,f2p);
    
    rewind(fp);
    rewind(f2p);
    
    int word_count = 0; 
    while (!feof(f2p)){
        char c = fgetc(f2p);
        if (c == ' '){
            word_count += 1;
        }
    }
    word_count += 1;
    
    the_words = malloc(3 * sizeof(char*));
    redact = strtok(buffer2, ", ");
    
    for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        redact = strtok(NULL, ", ");
    }
    
    char result[256] = "";
    word = strtok(buffer1, " ");
    while (word != NULL){
        for (int i = 0; i < word_count; i++){
            if (strcasecmp(the_words[i],word) == 0){
                for (int i = 0; i < strlen(word); i++){
                    strcat(result,"*");
                    
                }
                strcat(result, " ");
                break; 
            }
            else{
                if (i==(word_count-1)){
                    strcat(result, word);
                    strcat(result, " ");
                }  
            } 
        }
        word = strtok(NULL," "); 
    }
    
    fputs(result, f3p);
    
    fclose(fp);
    fclose(f2p);
    fclose(f3p);
    free(the_words);
}

So this is my C code to replace words from the file called text_filename with asterixs if the word exists in a file called redact_words_filename. However, I noticed during the comparison with the 2 strings

if (strcasecmp(the_words[i],word) == 0){
                for (int i = 0; i < strlen(word); i++){
                    strcat(result,"*");

                }

that when I have the word quick for example in both text files, the_words[i] contains a string of length 6 while the one in word contains a string of length 5, both containing the value quick, and so it is not registering as the same string. Why is one of the strings longer than another?

(Ps I apologise for the bad code quality)

Edit 1: Ok so I found out it has to do with \n which is put in at the end of every line. Trying to find a way to solve this.

Edit 2: I managed to get rid of \n through a simple for loop

for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        for (int j = 0; j < strlen(redact); j++){
            if (redact[j] == '\n'){
                redact[j] = '\0';
            }
        }
        redact = strtok(NULL, ", ");
    }



    the_words = malloc(3 * sizeof(char*));
    redact = strtok(buffer2, ", ");
    
    for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        redact = strtok(NULL, ", ");
    }

Two obvious problems just here

  • you allocate space for 3 pointers in the_words but then you go and put word_count words into it. So if word_count > 3, you'll overflow and get undefined behavior
  • for each word, you allocate 100 bytes, and then throw away that allocation, instead storing a pointer into buffer2. The buffer currently contains the word but that will change next time you read into it. You should just use the_words[i] = strdup(redact); to both allocate the right amount of memory, and copy the string into the allocated memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM