简体   繁体   English

为什么正则表达式“ //”和“ / *”不能匹配单个注释和块注释?

[英]Why the regular expression “//” and “/*” can't match the single comment and block comment?

I want to calculate the "empty line","single comment","block comment" about c++ program. 我想计算有关c ++程序的“空行”,“单个注释”,“块注释”。

I write the tool use flex.But the tool can't match the c++ block comment. 我使用flex编写该工具,但该工具无法匹配c ++块注释。

1 flex code: 1个弹性代码:

%{
    int block_flag = 0;
    int empty_num = 0;
    int single_line_num = 0;
    int block_line_num = 0;
    int line = 0;
%}

%%
^[\t ]*\n               {
    empty_num++;
    printf("empty line\n");
}
"//"    {
    single_line_num++;
    printf("single line comment\n");
}
"/*"  {
    block_flag = 1;
    block_line_num++;
    printf("block comment begin.block line:%d\n", block_line_num);
}

"*/"  {
    block_flag = 0;
    printf("block comment end.block line:%d\n", block_line_num);
}
^(.*)\n                 {
    if(block_flag)
    block_line_num++;
    else
    line++;
}

%%
int main(int argc , char *argv[])
{
    yyin = fopen(argv[1], "r");
    yylex();

    printf("lines :%d\n" ,line);
    fclose(yyin);

    return 0;
}

2 hello.c 2 hello.c

bbg@ubuntu:~$ cat hello.c 
#include <stdlib.h>

//
//
/*
 */

/*   */

3 output 3输出

bbg@ubuntu:~$ ./a.out hello.c 
empty line
empty line
lines :6

Why the "//" and "/*" can't match the single comment and block comment ? 为什么“ //”和“ / *”不能匹配单个注释和阻止注释?

Flex: 柔性:

  1. doesn't search. 不搜索。 It matches patterns sequentially, each one starting where the other one ends. 它按顺序匹配模式,每个模式都从另一个模式开始。

  2. always picks the pattern with the longest match. 始终选择匹配时间最长的模式。 (If two or more patterns match exactly the same amount, it picks the first one. (如果两个或多个模式完全匹配相同的数量,它将选择第一个。

So, you have 所以你有了

"//"   { /* Do something */ } 

and

^.*\n  { /* Do something else */ }

Suppose it has just matched the second one, so we're at the beginning of a line, and suppose the line starts // . 假设它刚好匹配第二个,所以我们在一行的开头,并假设该行以//开始。 Now, both these patterns match, but the second one matches the whole line, whereas the first one only matches two characters. 现在,这两种模式都匹配,但是第二种模式匹配整行,而第一种仅匹配两个字符。 So the second one wins. 因此,第二个获胜。 That wasn't what you wanted. 那不是你想要的。

Hint 1: You probably want // comments to match to the end of the line 提示1:您可能希望//注释匹配到行尾

Hint 2: There is a regular expression which will match /* comments, although it's a bit tedious: "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/" Unfortunately, if you use that, it won't count line ends for you, but you should be able to adapt it to do what you want. 提示2:虽然有点乏味,但有一个正则表达式可以匹配/*注释: "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/"不幸的是,如果您使用它,它将不会为您计算行尾,但是您应该能够使其适应您的需要。

Hint 3: You might want to think about comments which start in the middle of a line, possibly having been indented. 提示3:您可能想考虑从一行的中间开始的注释,可能已经缩进了。 You rule ^.*\\n will swallow an entire line without even looking to see if there is a comment somewhere inside it. 您认为^.*\\n会吞下整行,甚至不希望查看其中是否有注释。

Hint 4: String literals hide comments. 提示4:字符串文字会隐藏注释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM