简体   繁体   English

编码解码找到故障

[英]Encode-decode find a glitch pls

I have an encoder-decoder code, it takes word dumps with whitespaces with a "\\n" in the end as input, and encodes the top 5 word occurrences in the text, and vice versa. 我有一个编码器/解码器代码,它使用带有空格的单词转储作为结尾,并在其末尾带有“ \\ n”作为输入,并对文本中出现的前5个单词进行编码,反之亦然。 It seems to be working as charm, but my supervisor program still gives 6 fails from 10 times. 这似乎很吸引人,但是我的主管程序仍然有10次出现6次失败。 I encode the input file, then decode the encoded input file, and it works perfectly. 我对输入文件进行编码,然后对编码后的输入文件进行解码,并且效果很好。 Still dont understand whats wrong with it. 仍然不明白这是怎么回事。 Please guys I need your eyes! 拜托,我需要你的眼睛!

Simple encoder Find the 5 most common - at least 3 char long - words in a file, and replace them with a short code. 简单编码器在文件中找到5个最常用的字符(至少3个字符长),并用短代码替换。 The codes are 2 char long, and look like this: !1 !2...!5. 代码为2个字符长,看起来像这样:!1!2 ...!5。 !1 replaces the word with the most occurrences in the text, and !5 replaces the word with the least occurrences. !1替换单词出现次数最多的单词,!5替换单词出现次数最少的单词。 If two words have the same occurrence, the first in line goes first in the codelist ("sooner better"). 如果两个单词出现相同,则代码列表中的第一个单词排在第一位(“越早越好”)。 The rest of the words have to be untouched. 其余的单词必须保持不变。 In the beginning of the encoded file, the codelist have to be presented. 在编码文件的开头,必须显示代码表。 The program has to have decoding function as well. 该程序还必须具有解码功能。 If the input starts with a "!", then it have to adapt the codelist, and decode the whole file, recovering the original state. 如果输入以“!”开头,则它必须调整代码列表并解码整个文件,以恢复原始状态。 Input: Two kind of input files can exist. 输入:可以存在两种输入文件。 The original input contains a maximum of 2000 words with whitespaces between, a maximum of 22 characters each word. 原始输入最多包含2000个单词,中间有空格,每个单词最多22个字符。 After the last word, there is no whitespace, but a newline ("\\n"). 在最后一个单词之后,没有空格,而是换行符(“ \\ n”)。 Words are made from the letters of the english alphabet, all lowercase. 单词由英文字母(全部小写)组成。 Non of the words contains any "!" 非单词中包含任何“!” sign. 标志。 There are always a minimum of 5 different kind of words with at least of 3 chars. 总有至少5个不同种类的单词,至少3个字符。 Warning! 警告! If the input format equals the output format, decoding is needed of course! 如果输入格式等于输出格式,则当然需要解码! In case of decoding, the limit of 2000 words and 22 chars max are valid morover the codelist in the beginning of the file of course. 在解码的情况下,当然,文件列表开头的代码表有效期为2000个单词和最大22个字符。 Output: Fist 5 lines contain the codelist. 输出:第5行包含代码列表。 First word is the code, next to it the word, which it is replacing. 第一个单词是代码,第二个单词是它要替换的单词。 Whitespace between, newline after the replacebale word. 替换字符之间的换行符之间的空白。 From the 6th line, there comes the encoded text which needs decoding. 从第六行开始,需要解码的编码文本。 Whitespaces between the words, only one newline at the end. 单词之间的空格,末尾只有一个换行符。 Warning! 警告! Output file format can equal input file format! 输出文件格式可以等于输入文件格式! In that case encoding is needed of course. 在那种情况下,当然需要编码。

Requirements: "input.txt" for reading (readonly!) and "output.txt" (writeonly!) for writing. 要求:“ input.txt”用于读取(只读!),“ output.txt”(仅写入!)用于编写。 For succesful running, return 0; 要成功运行,请返回0;否则,返回0。 at the end of main() is necessary for avoiding fault-code. 在main()的末尾对于避免错误代码是必需的。 Probable fault codes: Memory- timelimit exceeded; 可能的故障代码:超出存储时间限制; Floating point fail, fe: dividing with zero.; 浮点失败,fe:除以零。 Memory access fail, array over indexing, usage of null pointer. 内存访问失败,索引建立数组,使用空指针。

/* Input.txt

   o xxa o xxb xxb o xxc o xxd xxb xxe xxe

   Output.txt 

   !1 xxb
   !2 xxe
   !3 xxa
   !4 xxc
   !5 xxd
   o !3 o !1 !1 o !4 o !5 !1 !2 !2 */


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct words {
            char *kod;
       int occurrence;
} TABLE;

int main() {
    FILE *data;
    char wordmax[23];
    TABLE *table = NULL;
    int number = 0, i, m, n;
    char c;
    data = fopen("be.txt", "r" );
    c = fgetc(data);
    FILE *outfile;
    outfile = fopen("ki.txt", "w");
    int first = 1;
    char codeReadIn[10][23];

    if ( c == '!' ) {
       data = fopen("be.txt", "r" );
   for ( i = 0; i<10; i++){
       fscanf(data, "%s", codeReadIn[i]);
   }

       while (fscanf(data, "%s", wordmax) != EOF ) {
             if (first == 0){
        fprintf(outfile, " ");
             }
         if (first == 1){
        first = 0;
         }
             for (m=0; m<10; m = m + 2) {                           
                 if (strcmp(wordmax, codeReadIn[m]) == 0) {     
                    fprintf(outfile, "%s", codeReadIn[m+1]);
            break;                    
                 }
             }
             if (m==10) {
                fprintf(outfile, "%s", wordmax);
         }
       }
    fprintf(outfile, "\n");
  } else {
  data = fopen("be.txt", "r" );
  while (fscanf(data, "%s", wordmax) != EOF ) {
    for (i = 0; i < number; ++i){
        if (strcmp(table[i].kod, wordmax) == 0){
            break;
         }
     }
    if (strlen(wordmax) <= 2){  // 2 char skip
       continue;
    }
    if (i == number) {
        ++number;
        table = (TABLE *)realloc(table, number * sizeof(TABLE));
        table[i].kod = (char *)malloc((strlen(wordmax) + 1) * sizeof(char));
        strcpy(table[i].kod, wordmax);
        table[i].occurrence = 1;
    }else{
        ++table[i].occurrence;
    }
}
int maxOccurrences[5];
char* maxCodes[5];
int j, k ;
for(j = 0; j < 5; j++){ // search for the top5 among occurrences
    maxOccurrences[j] = -1;
    for (i = 0; i < number; ++i){ // going trough occurrences
        // once put in top5, wont put it in again
        int foundone = 0;
        for (k = 0; k < j; k++){
            if ( strcmp(maxCodes[k], table[i].kod) == 0){
                 foundone = 1;
            }
        }
        if(foundone == 1){
            continue;
        } // search for max
        if ( table[i].occurrence > maxOccurrences[j] ) { // if bigger then better
             maxOccurrences[j] = table[i].occurrence;
             maxCodes[j] = table[i].kod;
        }
    }
}

char* kod[5];
kod[0] = "!1";
kod[1] = "!2";
kod[2] = "!3";
kod[3] = "!4";
kod[4] = "!5";
for (i=0;i<5;i++) {
fprintf(outfile, "%s %s\n", kod[i], maxCodes[i]);
}
int m;
data = fopen("be.txt", "r" );
first = 1;
while (fscanf(data, "%s", wordmax) != EOF ) { 
    if(first == 0){
   fprintf(outfile, " ");
    }
 if(first == 1){
   first = 0;
 }       
    for (m=0; m<j; m++) {                           
        if (strcmp(wordmax, maxCodes[m]) == 0) {     
            fprintf(outfile, "%s", kod[m]);
      break;                      
        }
    }
    if (m==j) {
       fprintf(outfile, "%s", wordmax);
 }
}
fprintf(outfile, "\n");
for (i=0;i<number;++i){
    free(table[i].kod);
}
free(table);    
}
fclose(data);
fclose(outfile);
return 0;
}

Why are you ignoring words of one or two letters? 您为什么忽略一两个字母的单词?

if (strlen(wordmax) <= 2){  // 2 char skip
       continue;
    }

And wouldn't it be better to use strcasecmp() instead of strcmp() , or are you actually required to treat upper- and lower-case words separately? 并且使用strcasecmp()代替strcmp()会更好吗,还是您实际上需要分别处理大小写单词?

EDIT: here are some test cases: 编辑:这是一些测试用例:

Input: one one two two three three four four five five 输入: one one two two three three four four five five

Output: 输出:

!1 two
!2 three
!3 four
!4 five
!5 one
!5 !5 !1 !1 !2 !2 !3 !3 !4 !4

Why is "one" last in the list? 为什么“一个”在列表中排在最后? Is this a problem? 这有问题吗?

Input: aaa bbb ccc ddd eee 输入: aaa bbb ccc ddd eee

Output: 输出:

!1 bbb
!2 ccc
!3 ddd
!4 eee
!5 ^A
aaa !1 !2 !3 !4

Something strange going on there 那里发生了奇怪的事情

Input: xxx yyy zzz 输入: xxx yyy zzz

Output: Bus error 输出:总线错误

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM