Encode-decode find a glitch pls

Question

I have an encoder-decoder code, it takes word dumps with whitespaces with a "\\n" in the end as input, and encodes the top 5 word occurrences in the text, and vice versa. It seems to be working as charm, but my supervisor program still gives 6 fails from 10 times. I encode the input file, then decode the encoded input file, and it works perfectly. Still dont understand whats wrong with it. Please guys I need your eyes!

Simple encoder Find the 5 most common - at least 3 char long - words in a file, and replace them with a short code. The codes are 2 char long, and look like this: !1 !2...!5. !1 replaces the word with the most occurrences in the text, and !5 replaces the word with the least occurrences. If two words have the same occurrence, the first in line goes first in the codelist ("sooner better"). The rest of the words have to be untouched. In the beginning of the encoded file, the codelist have to be presented. The program has to have decoding function as well. If the input starts with a "!", then it have to adapt the codelist, and decode the whole file, recovering the original state. Input: Two kind of input files can exist. The original input contains a maximum of 2000 words with whitespaces between, a maximum of 22 characters each word. After the last word, there is no whitespace, but a newline ("\\n"). Words are made from the letters of the english alphabet, all lowercase. Non of the words contains any "!" sign. There are always a minimum of 5 different kind of words with at least of 3 chars. Warning! If the input format equals the output format, decoding is needed of course! In case of decoding, the limit of 2000 words and 22 chars max are valid morover the codelist in the beginning of the file of course. Output: Fist 5 lines contain the codelist. First word is the code, next to it the word, which it is replacing. Whitespace between, newline after the replacebale word. From the 6th line, there comes the encoded text which needs decoding. Whitespaces between the words, only one newline at the end. Warning! Output file format can equal input file format! In that case encoding is needed of course.

Requirements: "input.txt" for reading (readonly!) and "output.txt" (writeonly!) for writing. For succesful running, return 0; at the end of main() is necessary for avoiding fault-code. Probable fault codes: Memory- timelimit exceeded; Floating point fail, fe: dividing with zero.; Memory access fail, array over indexing, usage of null pointer.

/* Input.txt

   o xxa o xxb xxb o xxc o xxd xxb xxe xxe

   Output.txt 

   !1 xxb
   !2 xxe
   !3 xxa
   !4 xxc
   !5 xxd
   o !3 o !1 !1 o !4 o !5 !1 !2 !2 */


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct words {
            char *kod;
       int occurrence;
} TABLE;

int main() {
    FILE *data;
    char wordmax[23];
    TABLE *table = NULL;
    int number = 0, i, m, n;
    char c;
    data = fopen("be.txt", "r" );
    c = fgetc(data);
    FILE *outfile;
    outfile = fopen("ki.txt", "w");
    int first = 1;
    char codeReadIn[10][23];

    if ( c == '!' ) {
       data = fopen("be.txt", "r" );
   for ( i = 0; i<10; i++){
       fscanf(data, "%s", codeReadIn[i]);
   }

       while (fscanf(data, "%s", wordmax) != EOF ) {
             if (first == 0){
        fprintf(outfile, " ");
             }
         if (first == 1){
        first = 0;
         }
             for (m=0; m<10; m = m + 2) {                           
                 if (strcmp(wordmax, codeReadIn[m]) == 0) {     
                    fprintf(outfile, "%s", codeReadIn[m+1]);
            break;                    
                 }
             }
             if (m==10) {
                fprintf(outfile, "%s", wordmax);
         }
       }
    fprintf(outfile, "\n");
  } else {
  data = fopen("be.txt", "r" );
  while (fscanf(data, "%s", wordmax) != EOF ) {
    for (i = 0; i < number; ++i){
        if (strcmp(table[i].kod, wordmax) == 0){
            break;
         }
     }
    if (strlen(wordmax) <= 2){  // 2 char skip
       continue;
    }
    if (i == number) {
        ++number;
        table = (TABLE *)realloc(table, number * sizeof(TABLE));
        table[i].kod = (char *)malloc((strlen(wordmax) + 1) * sizeof(char));
        strcpy(table[i].kod, wordmax);
        table[i].occurrence = 1;
    }else{
        ++table[i].occurrence;
    }
}
int maxOccurrences[5];
char* maxCodes[5];
int j, k ;
for(j = 0; j < 5; j++){ // search for the top5 among occurrences
    maxOccurrences[j] = -1;
    for (i = 0; i < number; ++i){ // going trough occurrences
        // once put in top5, wont put it in again
        int foundone = 0;
        for (k = 0; k < j; k++){
            if ( strcmp(maxCodes[k], table[i].kod) == 0){
                 foundone = 1;
            }
        }
        if(foundone == 1){
            continue;
        } // search for max
        if ( table[i].occurrence > maxOccurrences[j] ) { // if bigger then better
             maxOccurrences[j] = table[i].occurrence;
             maxCodes[j] = table[i].kod;
        }
    }
}

char* kod[5];
kod[0] = "!1";
kod[1] = "!2";
kod[2] = "!3";
kod[3] = "!4";
kod[4] = "!5";
for (i=0;i<5;i++) {
fprintf(outfile, "%s %s\n", kod[i], maxCodes[i]);
}
int m;
data = fopen("be.txt", "r" );
first = 1;
while (fscanf(data, "%s", wordmax) != EOF ) { 
    if(first == 0){
   fprintf(outfile, " ");
    }
 if(first == 1){
   first = 0;
 }       
    for (m=0; m<j; m++) {                           
        if (strcmp(wordmax, maxCodes[m]) == 0) {     
            fprintf(outfile, "%s", kod[m]);
      break;                      
        }
    }
    if (m==j) {
       fprintf(outfile, "%s", wordmax);
 }
}
fprintf(outfile, "\n");
for (i=0;i<number;++i){
    free(table[i].kod);
}
free(table);    
}
fclose(data);
fclose(outfile);
return 0;
}

Answer 1

Why are you ignoring words of one or two letters?

if (strlen(wordmax) <= 2){  // 2 char skip
       continue;
    }

And wouldn't it be better to use strcasecmp() instead of strcmp() , or are you actually required to treat upper- and lower-case words separately?

EDIT: here are some test cases:

Input: one one two two three three four four five five

Output:

!1 two
!2 three
!3 four
!4 five
!5 one
!5 !5 !1 !1 !2 !2 !3 !3 !4 !4

Why is "one" last in the list? Is this a problem?

Input: aaa bbb ccc ddd eee

Output:

!1 bbb
!2 ccc
!3 ddd
!4 eee
!5 ^A
aaa !1 !2 !3 !4

Something strange going on there

Input: xxx yyy zzz

Output: Bus error

Encode-decode find a glitch pls

Question

1 answers

solution1
0 ACCPTED 2013-11-01 20:37:33

Encode-decode find a glitch pls

Question

1 answers

solution1 0 ACCPTED 2013-11-01 20:37:33

solution1
0 ACCPTED 2013-11-01 20:37:33