简体   繁体   English

从java中的源代码中删除注释

[英]Remove comments from a source code in java

I want to remove all type of comments statements from a java source code file. 我想从java源代码文件中删除所有类型的注释语句。 Example: 例:

    String str1 = "SUM 10"      /*This is a Comments */ ;   
    String str2 = "SUM 10";     //This is a Comments"  
    String str3 = "http://google.com";   /*This is a Comments*/
    String str4 = "('file:///xghsghsh.html/')";  //Comments
    String str5 = "{\"temperature\": {\"type\"}}";  //comments

Expected Output: 预期产出:

    String str1 = "SUM 10"; 
    String str2 = "SUM 10";  
    String str3 = "http://google.com";
    String str4 = "('file:///xghsghsh.html/')";
    String str5 = "{\"temperature\": {\"type\"}}";

I am using the below regular expression to achieve : 我使用下面的正则表达式来实现:

    System.out.println(str1.replaceAll("[^:]//.*|/\\\\*((?!=*/)(?s:.))+\\\\*/", ""));

This gives me wrong result for str4 and str5. 这给了我str4和str5的错误结果。 Please help me to resolve this issue. 请帮我解决这个问题。

Using Andreas solutions: 使用Andreas解决方案:

        final String regex = "//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\\r\\n\"])*\")";
        final String string = "    String str1 = \"SUM 10\"      /*This is a Comments */ ;   \n"
             + "    String str2 = \"SUM 10\";     //This is a Comments\"  \n"
             + "    String str3 = \"http://google.com\";   /*This is a Comments*/\n"
             + "    String str4 = \"('file:///xghsghsh.html/')\";  //Comments\n"
             + "    String str5 = \"{\"temperature\": {\"type\"}}";  //comments";
        final String subst = "$1";

        // The substituted value will be contained in the result variable
        final String result = string.replaceAll(regex,subst);

        System.out.println("Substitution result: " + result);

Its working except str5. 它的工作除了str5。

To make it work, you need to "skip" string literals. 要使其工作,您需要“跳过”字符串文字。 You can do that by matching string literals, capturing them so they can be retained. 您可以通过匹配字符串文字来捕获它们,以便保留它们。

The following regex will do that, using $1 as the substitution string: 以下正则表达式将使用$1作为替换字符串:

//.*|/\\*(?s:.*?)\\*/|("(?:(?<!\\\\)(?:\\\\\\\\)*\\\\"|[^\\r\\n"])*")

See regex101 for demo. 有关演示,请参阅regex101

Java code is then: 然后是Java代码:

str1.replaceAll("//.*|/\\*(?s:.*?)\\*/|(\"(?:(?<!\\\\)(?:\\\\\\\\)*\\\\\"|[^\r\n\"])*\")", "$1")

Explanation 说明

//.*                      Match // and rest of line
|                        or
/\*(?s:.*?)\*/            Match /* and */, with any characters in-between, incl. linebreaks
|                        or
("                        Start capture group and match "
  (?:                      Start repeating group:
     (?<!\\)(?:\\\\)*\\"     Match escaped " optionally prefixed by escaped \'s
     |                      or
     [^\r\n"]                Match any character except " and linebreak
  )*                       End of repeating group
")                        Match terminating ", and end of capture group
$1                        Keep captured string literal

As others said, regex is not a good option here. 正如其他人所说,正则表达式在这里不是一个好选择。 You could use a simple DFA for this task. 您可以使用简单的DFA执行此任务。
Here's an example that will get you intervals of multiple line comments ( /* */ ). 这是一个示例,它将为您提供多行注释的间隔( /* */ )。
You can do the same way for single line comments ( // -- \\n ). 您可以对单行注释( // -- \\n )执行相同的操作。

    String input = ...; //here's your input String

    //0 - source code, 
    //1 - multiple lines comment (start) (/ char)
    //2 - multiple lines comment (start) (* char)
    //3 - multiple lines comment (finish) (* char)
    //4 - multiple lines comment (finish) (/ char)
    byte state = 0; 
    int startPos = -1;
    int endPos = -1;
    for (int i = 0; i < input.length(); i++) {
        switch (state) {
        case 0:
            if (input.charAt(i) == '/') {
                   state = 1;
                   startPos = i;
            }
            break;
        case 1:
            if (input.charAt(i) == '*') {
                state = 2;
            }
            break;
        case 2:
            if (input.charAt(i) == '*') {
               state = 3;
            }
            break;
        case 3:
            if (input.charAt(i) == '/') {
                state = 0;
                endPos = i+1;

                //here you have the comment between startPos and endPos indices,
                //you can do whatever you want with it
            }

            break;
        default:
            break;
        }
    }

{...wishing I could comment...} {...希望我能评论......}

I recommend a two-pass process; 我推荐一个两遍过程; one based upon end of line (//) the other not (/* */). 一个基于行尾(//)另一个不是(/ * * /)。

I like Pavel's idea; 我喜欢帕维尔的想法; however, I don't see how it checks to make sure the star is the next character after a slash and vice versa on closing out. 然而,我没有看到它如何检查以确保明星是斜线后的下一个字符,反之亦然。

I like Andreas' idea; 我喜欢安德烈亚斯的想法; however, I wasn't able to get it to work on multi-line comments. 但是,我无法让它在多行注释上工作。

https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-CommentTail https://docs.oracle.com/javase/specs/jls/se12/html/jls-3.html#jls-CommentTail

Maybe, it would be best to start with multiple simple expressions, step by step, such as: 也许,最好从一步一步开始使用多个简单表达式,例如:

.*(\s*\/\*.*|\s*\/\/.*)

to initially remove the inline comments. 最初删除内联注释。

Demo 演示

Test 测试

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(.*)(\\s*\\/\\*.*|\\s*\\/\\/.*)";
final String string = "    String str1 = \"SUM 10\"      /*This is a Comments */ ;   \n"
     + "    String str2 = \"SUM 10\";     //This is a Comments\"  \n"
     + "    String str3 = \"http://google.com\";   /*This is a Comments*/\n"
     + "    String str4 = \"('file:///xghsghsh.html/')\";  //Comments\n"
     + "    String str5 = \"{\\\"temperature\\\": {\\\"type\\\"}}\";  //comments";
final String subst = "\\1";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);

System.out.println("Substitution result: " + result);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM