Java-正则表达式在代码中查找注释

Question

A little fun with Java this time. 这次使用Java有点乐趣。 I want to write a program that reads a code from standard input (line by line, for example), like: 我想编写一个从标准输入（例如，逐行）读取代码的程序，例如：

// some comment
class Main {
    /* blah */
    // /* foo
    foo();
    // foo */
    foo2();
    /* // foo2 */
}

finds all comments in it and removes them. 在其中找到所有注释并将其删除。 I'm trying to use regular expressions, and for now I've done something like this: 我正在尝试使用正则表达式，现在我做了这样的事情：

private static String ParseCode(String pCode)
{
    String MyCommentsRegex = "(?://.*)|(/\\*(?:.|[\\n\\r])*?\\*/)";
    return pCode.replaceAll(MyCommentsRegex, " ");
}

but it seems not to work for all the cases, eg: 但似乎不适用于所有情况，例如：

System.out.print("We can use /* comments */ inside a string of course, but it shouldn't start a comment");

Any advice or ideas different from regex? 与正则表达式有什么不同的建议或想法吗？ Thanks in advance. 提前致谢。

Answer 1

You may have already given up on this by now but I was intrigued by the problem. 您现在可能已经放弃了，但是这个问题让我很感兴趣。

I believe this is a partial solution... 我相信这是部分解决方案...

Native regex: 本机正则表达式：

//.*|("(?:\\[^"]|\\"|.)*?")|(?s)/\*.*?\*/

In Java: 在Java中：

String clean = original.replaceAll( "//.*|(\"(?:\\\\[^\"]|\\\\\"|.)*?\")|(?s)/\\*.*?\\*/", "$1 " );

This appears to properly handle comments embedded in strings as well as properly escaped quotes inside strings. 这似乎可以正确处理字符串中嵌入的注释以及字符串中正确转义的引号。 I threw a few things at it to check but not exhaustively. 我对其进行了一些检查，但并未详尽。

There is one compromise in that all "" blocks in the code will end up with space after them. 有一个折衷之处，就是代码中的所有“”块都将在它们之后以空格结尾。 Keeping this simple and solving that problem would be very difficult given the need to cleanly handle: 考虑到需要彻底处理，保持这种简单性并解决该问题将非常困难：

int/* some comment */foo = 5;

A simple Matcher.find/appendReplacement loop could conditionally check for group(1) before replacing with a space and would only be a handful of lines of code. 一个简单的Matcher.find / appendReplacement循环可以在用空格替换之前有条件地检查group（1），并且只有几行代码。 Still simpler than a full up parser maybe. 可能比完整的解析器还要简单。 (I could add the matcher loop too if anyone is interested.) （如果有人感兴趣，我也可以添加匹配器循环。）

Answer 2

The last example is no problem I think: 最后一个例子是我认为没有问题：

/* we comment out some code
System.out.print("We can use */ inside a string of course");
we end the comment */

... because the comment actually ends with "We can use */ . This code does not compile. ...因为注释实际上以"We can use */结束"We can use */ 。此代码无法编译。

But I have another problematic case: 但是我还有另一个有问题的情况：

int/*comment*/foo=3;

Your pattern will transform this into: 您的模式会将其转换为：

intfoo=3;

...what is invalid code. ...什么是无效的代码。 So better replace your comments with " " instead of "" . 因此，最好将您的注释替换为" "而不是"" 。

Answer 3

I think a 100% correct solution using regular expressions is either inhuman or impossible (taking into account escapes, etc.). 我认为使用正则表达式100％正确的解决方案要么是不人道的，要么是不可能的（考虑到转义符等）。

I believe the best option would be using ANTLR- I believe they even provide a Java grammar you can use. 我相信最好的选择是使用ANTLR-我相信它们甚至提供了您可以使用的Java语法。

Answer 4

I ended up with this solution. 我最终得到了这个解决方案。

public class CommentsFun {
    static List<Match> commentMatches = new ArrayList<Match>();

    public static void main(String[] args) {
        Pattern commentsPattern = Pattern.compile("(//.*?$)|(/\\*.*?\\*/)", Pattern.MULTILINE | Pattern.DOTALL);
        Pattern stringsPattern = Pattern.compile("(\".*?(?<!\\\\)\")");

        String text = getTextFromFile("src/my/test/CommentsFun.java");

        Matcher commentsMatcher = commentsPattern.matcher(text);
        while (commentsMatcher.find()) {
            Match match = new Match();
            match.start = commentsMatcher.start();
            match.text = commentsMatcher.group();
            commentMatches.add(match);
        }

        List<Match> commentsToRemove = new ArrayList<Match>();

        Matcher stringsMatcher = stringsPattern.matcher(text);
        while (stringsMatcher.find()) {
            for (Match comment : commentMatches) {
                if (comment.start > stringsMatcher.start() && comment.start < stringsMatcher.end())
                    commentsToRemove.add(comment);
            }
        }
        for (Match comment : commentsToRemove)
            commentMatches.remove(comment);

        for (Match comment : commentMatches)
            text = text.replace(comment.text, " ");

        System.out.println(text);
    }

    //Single-line

    // "String? Nope"

    /*
    * "This  is not String either"
    */

    //Complex */
    ///*More complex*/

    /*Single line, but */

    String moreFun = " /* comment? doubt that */";

    String evenMoreFun = " // comment? doubt that ";

    static class Match {
        int start;
        String text;
    }
}

Answer 5

Another alternative is to use some library supporting AST parsing, for eg org.eclipse.jdt.core has all the APIs you need to do this and more. 另一种选择是使用一些支持AST解析的库，例如org.eclipse.jdt.core具有执行此操作所需的所有API以及更多功能。 But then that's just one alternative:) 但这只是一种选择：）

Java-正则表达式在代码中查找注释

问题描述

5 个解决方案

解决方案1
25 2009-11-16 07:49:58

解决方案2
3 2009-11-01 12:56:51

解决方案3
3 2009-11-01 12:58:15

解决方案4
3 2015-04-16 15:20:41

解决方案5
0 2009-11-01 12:41:27

Java-正则表达式在代码中查找注释

问题描述

5 个解决方案

解决方案1 25 2009-11-16 07:49:58

解决方案2 3 2009-11-01 12:56:51

解决方案3 3 2009-11-01 12:58:15

解决方案4 3 2015-04-16 15:20:41

解决方案5 0 2009-11-01 12:41:27

解决方案1
25 2009-11-16 07:49:58

解决方案2
3 2009-11-01 12:56:51

解决方案3
3 2009-11-01 12:58:15

解决方案4
3 2015-04-16 15:20:41

解决方案5
0 2009-11-01 12:41:27