簡體   English   中英

用C代碼計算文件中字符,單詞和行的數量

[英]C code to count the number of characters, words and lines in a file

我是C的初學者,所以我想看一段代碼,該代碼包含對給定文件中的字符,單詞和行數進行計數。 我在下面的代碼中找到了問題,但我不明白為什么我們必須在while循環后增加最后一個單詞的單詞和行數: if (characters > 0)...

#include <stdio.h>
#include <stdlib.h>

int main() {
    FILE *file;
    char path[100];
    char ch;
    int characters, words, lines;

    /* Input path of files to merge to third file */
    printf("Enter source file path: ");
    scanf("%s", path);

    /* Open source files in 'r' mode */
    file = fopen(path, "r");

    /* Check if file opened successfully */
    if (file == NULL) {
        printf("\nUnable to open file.\n");
        printf("Please check if file exists and you have read privilege.\n");
        exit(EXIT_FAILURE);
    }

    /*
     * Logic to count characters, words and lines.
     */
    characters = words = lines = 0;
    while ((ch = fgetc(file)) != EOF) {
        characters++;

        /* Check new line */
        if (ch == '\n' || ch == '\0')
            lines++;

        /* Check words */
        if (ch == ' ' || ch == '\t' || ch == '\n' || ch == '\0')
            words++;
    }

    /* Increment words and lines for last word */
    if (characters > 0) {
        words++;
        lines++;
    }

    /* Print file statistics */
    printf("\n");
    printf("Total characters = %d\n", characters);
    printf("Total words      = %d\n", words);
    printf("Total lines      = %d\n", lines);

    /* Close files to release resources */
    fclose(file);

    return 0;
}

該程序有一些問題:

  • ch必須定義為int ,以便正確檢測EOF

  • scanf("%s", path);超長輸入scanf("%s", path); 將溢出path並導致未定義的行為。 還要檢查返回值以檢測無效的輸入或文件的提前結束:

     if (scanf("%99s", path) != 1) return 1; 
  • 測試ch == '\\0'以計算行數是有爭議的。 標准的wc unix實用程序不會將空字節用作行分隔符。

  • if (ch == ' ' || ch == '\\t' || ch == '\\n' || ch == '\\0')也不是檢測單詞邊界的標准方法。 if (isspace(ch))更慣用。

  • 字數錯誤:多個空格將計為多個字! 您應該改為檢測邊界,即空格字符后跟非空格字符。

  • 最后的測試是解決上述問題的la腳嘗試,但這還不夠。 如果流不以換行符結尾,則確實需要進行額外的測試以計算流的最后一位。

這是更正的版本:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    FILE *file;
    char path[1024];
    int ch, last;
    long long int characters, words, lines;

    /* Input path of files to merge to third file */
    printf("Enter source file path: ");
    if (scanf("%255s", path) != 1) {
        printf("Invalid input\n");
        return EXIT_FAILURE;
    }

    /* Open source files in 'r' mode */
    file = fopen(path, "r");

    /* Check if file opened successfully */
    if (file == NULL) {
        printf("Unable to open file %s\n", path);
        printf("Please check if file exists and you have read privilege.\n");
        return EXIT_FAILURE;
    }

    /*
     * Logic to count characters, words and lines.
     */
    characters = words = lines = 0;
    last = '\n';
    while ((ch = fgetc(file)) != EOF) {
        characters++;

        /* Check new line */
        if (ch == '\n')
            lines++;

        /* Check words */
        if (!isspace(ch) && isspace(last))
            words++;

        last = ch;
    }

    /* Increment words and lines for last word */
    if (last != '\n') {
        lines++;
    }

    /* Print file statistics */
    printf("\n");
    printf("Total characters = %lld\n", characters);
    printf("Total words      = %lld\n", words);
    printf("Total lines      = %lld\n", lines);

    /* Close file to release resources */
    fclose(file);

    return 0;
}

根據輸入的輸入文件是否以漂亮的換行符('\\ n')結尾,將需要調整輸出。

對於在所有行(包括最后一行)都以'\\ n'結尾的普通sain文本文件,請在循環后刪除這些增量。

但是對於這些特殊情況,似乎需要對程序進行一些調試,這取決於您的定義。 但是我強烈建議使用Linux / Unix命令wc作為參考和決勝局。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM