文件中注释的字符数（C编程）

Question

I can't seem to get it right, tried everything, but.. 我似乎无法正确解决所有问题，但是..

int commentChars() {
char str[256], fileName[256];
FILE *fp;
int i;


do{
    long commentCount=0;
    fflush(stdin);
    printf("%s\nEnter the name of the file in %s/", p, dir);
    gets(fileName);

    if(!(fp=fopen(fileName, "r"))) {
            printf("Error! File not found, try again");
                return 0;
    }

    while(!feof(fp)) {
            fgets(str,sizeof str,fp);
            for(int i=0;i<=sizeof str;i++) {
                if(str[i] == '/' && str[i+1] == '/') {
                        commentCount += (strlen(str)-2);
                }
            }
    }

    fclose(fp);

        printf("All the chars, contained in a comment: %ld\n", commentCount);
        puts(p);
        printf("Do you want to search for another file?<Y/N>: ");
        i=checker();


}while(i);}

The result is "All the chars, containted in a comment: 0", even though I have comments. 结果是“所有字符，包含在注释中：0”，即使我有注释也是如此。 And my second question was.. Analogically, how can I do the same for comments, containing /* */, seems like an impossible job for me. 我的第二个问题是..类似地，对于包含/ * * /的注释，我该怎么做呢？

Answer 1

This basically trivial modification of your code deals with several problems in your code. 对代码的这种基本的琐碎修改处理了代码中的几个问题。

You should not use feof() like that — `while (!feof(file)) is always wrong . 你不应该这样使用feof() ，而while（！feof（file））总是错误的。
You should not read data that is not part of the string just read. 您不应该读取不属于刚刚读取的字符串的数据。

I've also refactored your code so that the function takes a file name, opens, counts and closes it, and reports on what it found. 我还重构了您的代码，以便该函数获取文件名，打开，计数和关闭文件名，并报告找到的文件。

#include <stdio.h>
#include <string.h>

// Revised interface - process a given file name, reporting
static void commentChars(char const *file)
{
    char str[256];
    FILE *fp;
    long commentCount = 0;

    if (!(fp = fopen(file, "r")))
    {
        fprintf(stderr, "Error! File %s not found\n", file);
        return;
    }

    while (fgets(str, sizeof(str), fp) != 0)
    {
        int len = strlen(str);
        for (int i = 0; i <= len; i++)
        {
            if (str[i] == '/' && str[i + 1] == '/')
            {
                commentCount += (strlen(str) - 2);
                break;
            }
        }
    }

    fclose(fp);

    printf("%s: Number of characters contained in comments: %ld\n", file, commentCount);
}

int main(int argc, char **argv)
{
    if (argc == 1)
        commentChars("/dev/stdin");
    else
    {
        for (int i = 1; i < argc; i++)
            commentChars(argv[i]);
    }
    return 0;
}

When run on the source code ( ccc.c ), it yields: 在源代码（ ccc.c ）上运行时，它产生：

ccc.c: Number of characters contained in comments: 58

The comment isn't really complete (oops), but it serves to show what goes on. 该评论并没有真正完成（哎呀），但可以用来说明发生了什么。 It counts the newline which fgets() preserves as part of the comment, though the // introducer is not counted. 它会计算fgets()保留为注释一部分的换行符，尽管不计算//引入程序。

Dealing with /* comments is harder. 处理/*注释比较困难。 You need to spot a slash followed by a star, and then read up to the next star slash character pair. 您需要先发现一个斜杠，然后是一个星号，然后阅读下一个星号斜杠字符对。 This is probably more easily done using character by character input than line-by-line input; 使用逐字符输入比逐行输入更容易做到这一点。 you will, at least, need to be able to interleave character analysis with line input. 您至少需要能够将字符分析与行输入进行交错。

When you're ready for it, you can try this torture test on your program. 准备就绪后，可以在程序上尝试此酷刑测试。 It's what I use to check my comment stripper, SCC (which doesn't handle trigraphs — by conscious decision; if the source contains trigraphs, I have a trigraph remover which I use on the source first). 这就是我用来检查注释剥离程序SCC的功能（SCC不处理三部曲-通过有意识的决定；如果源中包含三部曲，则我有一个三部曲删除器，首先要在源头上使用）。

/*
@(#)File:            $RCSfile: scc.test,v $
@(#)Version:         $Revision: 1.7 $
@(#)Last changed:    $Date: 2013/09/09 14:06:33 $
@(#)Purpose:         Test file for program SCC
@(#)Author:          J Leffler
*/

/*TABSTOP=4*/

// -- C++ comment

/*
Multiline C-style comment
#ifndef lint
static const char sccs[] = "@(#)$Id: scc.test,v 1.7 2013/09/09 14:06:33 jleffler Exp $";
#endif
*/

/*
Multi-line C-style comment
with embedded /* in line %C% which should generate a warning
if scc is run with the -w option
Two comment starts /* embedded /* in line %C% should generate one warning
*/

/* Comment */ Non-comment /* Comment Again */ Non-Comment Again /*
Comment again on the next line */

// A C++ comment with a C-style comment marker /* in the middle
This is plain text under C++ (C99) commenting - but comment body otherwise
// A C++ comment with a C-style comment end marker */ in the middle

The following C-style comment end marker should generate a warning
if scc is run with the -w option
*/
Two of these */ generate */ one warning

It is possible to have both warnings on a single line.
Eg:
*/ /* /* */ */

SCC has been trained to handle 'q' single quotes in most of
the aberrant forms that can be used.  '\0', '\\', '\'', '\\
n' (a valid variant on '\n'), because the backslash followed
by newline is elided by the token scanning code in CPP before
any other processing occurs.

This is a legitimate equivalent to '\n' too: '\
\n', again because the backslash/newline processing occurs early.

The non-portable 'ab', '/*', '*/', '//' forms are handled OK too.

The following quote should generate a warning from SCC; a
compiler would not accept it.  '
\n'

" */ /* SCC has been trained to know about strings /* */ */"!
"\"Double quotes embedded in strings, \\\" too\'!"
"And \
newlines in them"

"And escaped double quotes at the end of a string\""

aa '\\
n' OK
aa "\""
aa "\
\n"

This is followed by C++/C99 comment number 1.
// C++/C99 comment with \
continuation character \
on three source lines (this should not be seen with the -C flag)
The C++/C99 comment number 1 has finished.

This is followed by C++/C99 comment number 2.
/\
/\
C++/C99 comment (this should not be seen with the -C flag)
The C++/C99 comment number 2 has finished.

This is followed by regular C comment number 1.
/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++  comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */
The regular C comment number 3 has finished.

Note that \u1234 and \U0010FFF0 are legitimate Unicode characters
(officially universal character names) that could appear in an
id\u0065ntifier, a '\u0065' character constant, or in a "char\u0061cter\
 string".  Since these are mapped long after comments are eliminated,
they cannot affect the interpretation of /* comments */.  In particular,
none of \u0002A.  \U0000002A, \u002F and \U0000002F ever constitute part
of a comment delimiter ('*' or '/').

More double quoted string stuff:

    if (logtable_out)
    {
    sprintf(logtable_out,
        "insert into %s (bld_id, err_operation, err_expected, err_sql_stmt, err_sql_state)" 
        " values (\"%s\", \"%s\", \"%s\", \"", str_logtable, blade, operation, expected);
    /* watch out for embedded double quotes. */
    }


/* Non-terminated C-style comment at the end of the file

Answer 2

I think you best use regular expressions. 我认为您最好使用正则表达式。 They seem scary, but they're really not that bad for things like this. 它们看起来很吓人，但是对于这样的事情，它们确实没有那么糟糕。 You can always try playing some regex golf to practice ;-) 您可以随时尝试打些正则表达式高尔夫来练习;-)

I'd approach it as follows: 我将按照以下方式进行处理：

Build a regular expression that captures comments 构建捕获注释的正则表达式
Scan your file for it 扫描文件
Count the characters in the match 计算比赛中的人物

Using some regex code and a bit about matching comments in C , I hacked this together which should allow you to count all the bytes that are part of a block style comment /* */ - Including the delimiters. 使用一些正则表达式代码和一些有关在C中匹配注释的技巧，我一起破解了它，这应该允许您计算作为块样式注释/ * * /-包括定界符在内的所有字节。 I only tested it on OS X. I suppose you can handle the rest? 我只在OS X上测试过。我想您可以处理其余的吗？

#include <regex.h>
#include <stdio.h>
#include <stdlib.h>

#define MAX_ERROR_MSG 0x1000

int compile_regex(regex_t *r, char * regex_text)
{
    int status = regcomp (r, regex_text, REG_EXTENDED|REG_NEWLINE|REG_ENHANCED);
    if (status != 0) {
        char error_message[MAX_ERROR_MSG];
        regerror (status, r, error_message, MAX_ERROR_MSG);
        printf ("Regex error compiling '%s': %s\n",
            regex_text, error_message);
        return 1;
    }
    return 0;
}
int match_regex(regex_t *r, const char * to_match, long long *nbytes)
{
    /* Pointer to end of previous match */
    const char *p = to_match;
    /* Maximum number of matches */
    size_t n_matches = 10;
    /* Array of matches */
    regmatch_t m[n_matches];

    while(1) {
        int i = 0;
        int nomatch = regexec (r, p, n_matches, m, 0);
        if(nomatch) {
            printf("No more matches.\n");
            return nomatch;
        }
        //Just handle first match (the entire match), don't care
        //about groups
        int start;
        int finish;
        start = m[0].rm_so + (p - to_match);
        finish = m[0].rm_eo + (p - to_match);
        *nbytes += m[0].rm_eo - m[0].rm_so;

        printf("match length(bytes) : %lld\n", m[0].rm_eo - m[0].rm_so);
        printf("Match: %.*s\n\n", finish - start, to_match + start);
        p += m[0].rm_eo;
    }
    return 0;
}

int main(int argc, char *argv[])
{
    regex_t r;
    char regex_text[128] = "/\\*(.|[\r\n])*?\\*/";
    long long comment_bytes = 0;

    char *file_contents;
    size_t input_file_size;
    FILE *input_file;
    if(argc != 2) {
        printf("Usage : %s <filename>", argv[0]);
        return 0;
    }
    input_file = fopen(argv[1], "rb");
    fseek(input_file, 0, SEEK_END);
    input_file_size = ftell(input_file);
    rewind(input_file);
    file_contents = malloc(input_file_size * (sizeof(char)));
    fread(file_contents, sizeof(char), input_file_size, input_file);

    compile_regex(&r, regex_text);
    match_regex(&r, file_contents, &comment_bytes);
    regfree(&r);
    printf("Found %lld bytes in comments\n", comment_bytes);

    return 0;
}

Answer 3

#include <stdio.h>

size_t counter(FILE *fp){
    int ch, chn;
    size_t count = 0;
    enum { none, in_line_comment, in_range_comment, in_string, in_char_constant } status;
#if 0
    in_range_comment : /* this */
    in_line_comment  : //this
    in_string : "this"
    in_char_constnt : ' '
#endif

    status = none;
    while(EOF!=(ch=fgetc(fp))){
        switch(status){
        case in_line_comment :
            if(ch == '\n'){
                status = none;
            }
            ++count;
            continue;
        case in_range_comment :
            if(ch == '*'){
                chn = fgetc(fp);
                if(chn == '/'){
                    status  = none;
                    continue;
                }
                ungetc(chn, fp);
            }
            ++count;
            continue;
        case in_string :
            if(ch == '\\'){
                chn = fgetc(fp);
                if(chn == '"'){
                    continue;
                }
                ungetc(chn, fp);
            } else {
                if(ch == '"')
                    status = none;
            }
            continue;
        case in_char_constant :
            if(ch == '\\'){
                chn = fgetc(fp);
                if(chn == '\''){
                    continue;
                }
                ungetc(chn, fp);
            } else {
                if(ch == '\'')
                    status = none;
            }
            continue;
        case none :
            switch(ch){
            case '/':
                if('/' == (chn = fgetc(fp))){
                    status = in_line_comment;
                    continue;
                } else if('*' == chn){
                    status = in_range_comment;
                    continue;
                } else
                    ungetc(chn, fp);
                break;
            case '"':
                status = in_string;
                break;
            case '\'':
                status = in_char_constant;
                break;
            }
        }
    }
    return count;
}

int main(void){
    FILE *fp = stdin;
    size_t c = counter(fp);
    printf("%lu\n", c);

    return 0;
}

文件中注释的字符数（C编程）

问题描述

3 个解决方案

解决方案1
1 2014-01-19 20:26:23

解决方案2
1 已采纳 2014-01-19 21:13:26

解决方案3
0 2014-01-20 10:00:53

文件中注释的字符数（C编程）

问题描述

3 个解决方案

解决方案1 1 2014-01-19 20:26:23

解决方案2 1 已采纳 2014-01-19 21:13:26

解决方案3 0 2014-01-20 10:00:53

解决方案1
1 2014-01-19 20:26:23

解决方案2
1 已采纳 2014-01-19 21:13:26

解决方案3
0 2014-01-20 10:00:53