简体   繁体   English

Java中的正则表达式注释匹配代码无法正常工作

[英]regex comment matching code in java not working properly

I have this code for Identifying the comments and print them in java 我有这段代码用于标识注释并在Java中打印它们

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Solution {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("(\\/\\*((.|\n)*)\\*\\/)|\\/\\/.*");
        String code = "";
        Scanner scan = new Scanner(System.in);
        while(scan.hasNext())
        {
            code+=(scan.nextLine()+"\n");

        }
        Matcher matcher = pattern.matcher(code);
        int nxtBrk=code.indexOf("\n");
        while(matcher.find())
        {

            int i=matcher.start(),j=matcher.end();
            if(nxtBrk<i)
            {
                System.out.print("\n");
            }
            System.out.print(code.substring(i,j));
            nxtBrk = code.indexOf("\n",j);

        }



    scan.close();
    }

}

Now when I try the code against this input 现在,当我尝试针对此输入的代码

 /*This is a program to calculate area of a circle after getting the radius as input from the user*/  
\#include<stdio.h>  
int main()  
{ //something

It outputs right and only the comments. 它只输出正确的注释。 But when I give the input 但是当我输入

 /*This is a program to calculate area of a circle after getting the radius as input from the user*/  
\#include<stdio.h>  
int main()  
{//ok
}  
/*A test run for the program was carried out and following output was observed  
If 50 is the radius of the circle whose area is to be calculated
The area of the circle is 7857.1429*/  

The program outputs the whole code instead of just the comments. 该程序输出整个代码,而不仅仅是注释。 I don't know what wrong is doing the addition of that last lines. 我不知道最后一行的加法是什么错误。

EDIT: parser is not an option because I am solving problems and I have to use programming language . 编辑:解析器不是一种选择,因为我正在解决问题,并且我必须使用编程语言。 link https://www.hackerrank.com/challenges/ide-identifying-comments 链接https://www.hackerrank.com/challenges/ide-identifying-comments

Parsing source code with regular expressions is very unreliable. 用正则表达式解析源代码是非常不可靠的。 I'd suggest you use a specialized parser. 我建议您使用专门的解析器。 Creating one is pretty simple using antlr . 使用antlr创建一个非常简单。 And, since you seem to be parsing C source files, you can use the C grammar . 并且,由于您似乎正在解析C源文件,因此可以使用C语法

Your pattern, shorn of its Java quoting (and some unnecessary backslashes), is this: 您的模式,用它的Java引号(和一些不必要的反斜杠)表示,是这样的:

(/\*((.|
)*)\*/)|//.*

That's fine enough, except that it has just greedy quantifiers which means that it will match from the first /* to the last */ . 这很好,只是它只有贪婪的量词,这意味着它将从第一个/*最后一个 */匹配。 You want non-greedy quantifiers instead, to get this pattern: 您希望使用非贪婪的量词来获取此模式:

(/\*((.|
)*?)\*/)|//.*

Small change, big consequence since it now matches to the first */ after the /* . 小变化,大后果,因为它现在与/*之后的第一个 */相匹配。 Re-encoded as Java code. 重新编码为Java代码。

Pattern pattern = Pattern.compile("(/\\*((.|\n)*?)\\*/)|//.*");

(Be aware that you are very close to the limit of what it is sensible to match with regular expressions. Indeed, it's actually incorrect since you might have strings with /* or // in. But you'll probably get away with it…) (请注意,您已经非常接近与正则表达式匹配的极限了。确实,这实际上是不正确的,因为您可能使用了带有/*// in的字符串。但是您可能会摆脱它… )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM