简体   繁体   English

Java:从字符串中删除注释

[英]Java: Removing comments from string

I'd like to do a function which gets a string and in case it has inline comments it removes it.我想做一个 function 获取一个字符串,如果它有内联注释,它会删除它。 I know it sounds pretty simple but i wanna make sure im doing this right, for example:我知道这听起来很简单,但我想确保我做对了,例如:

private String filterString(String code) {
  // lets say code = "some code //comment inside"

  // return the string "some code" (without the comment)
}

I thought about 2 ways: feel free to advice otherwise我考虑了两种方法:否则请随时提出建议

  1. Iterating the string and finding double inline brackets and using substring method.迭代字符串并查找双内联括号并使用 substring 方法。
  2. regex way.. (im not so sure bout it)正则表达式方式..(我不太确定)

can u tell me what's the best way and show me how it should be done?你能告诉我什么是最好的方法并告诉我应该怎么做吗? (please don't advice too advanced solutions) (请不要建议太高级的解决方案)

edited: can this be done somehow with Scanner object?编辑:这可以用扫描仪 object 以某种方式完成吗? (im using this object anyway) (无论如何我都在使用这个 object)

If you want a more efficient regex to really match all types of comments, use this one :如果你想要一个更有效的正则表达式来真正匹配所有类型的评论,请使用这个:

replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","");

source : http://ostermiller.org/findcomment.html来源: http : //ostermiller.org/findcomment.html

EDIT:编辑:

Another solution, if you're not sure about using regex is to design a small automata like follows :另一个解决方案,如果您不确定是否使用正则表达式,则设计一个如下所示的小型自动机:

public static String removeComments(String code){
    final int outsideComment=0;
    final int insideLineComment=1;
    final int insideblockComment=2;
    final int insideblockComment_noNewLineYet=3; // we want to have at least one new line in the result if the block is not inline.
    
    int currentState=outsideComment;
    String endResult="";
    Scanner s= new Scanner(code);
    s.useDelimiter("");
    while(s.hasNext()){
        String c=s.next();
        switch(currentState){
            case outsideComment: 
                if(c.equals("/") && s.hasNext()){
                    String c2=s.next();
                    if(c2.equals("/"))
                        currentState=insideLineComment;
                    else if(c2.equals("*")){
                        currentState=insideblockComment_noNewLineYet;
                    }
                    else 
                        endResult+=c+c2;
                }
                else
                    endResult+=c;
                break;
            case insideLineComment:
                if(c.equals("\n")){
                    currentState=outsideComment;
                    endResult+="\n";
                }
            break;
            case insideblockComment_noNewLineYet:
                if(c.equals("\n")){
                    endResult+="\n";
                    currentState=insideblockComment;
                }
            case insideblockComment:
                while(c.equals("*") && s.hasNext()){
                    String c2=s.next();
                    if(c2.equals("/")){
                        currentState=outsideComment;
                        break;
                    }
                    
                }
                
        }
    }
    s.close();
    return endResult;   
}

The best way to do this is to use regular expressions.最好的方法是使用正则表达式。 At first to find the /**/ comments and then remove all // commnets.首先找到/**/注释,然后删除所有// commnets。 For example:例如:

private String filterString(String code) {
  String partialFiltered = code.replaceAll("/\\*.*\\*/", "");
  String fullFiltered = partialFiltered.replaceAll("//.*(?=\\n)", "")
}

Just use the replaceAll method from the String class, combined with a simple regular expression .只需使用 String 类中的replaceAll方法,并结合一个简单的正则表达式 Here's how to do it:这是如何做到的:

import java.util.*;
import java.lang.*;

class Main
{
        public static void main (String[] args) throws java.lang.Exception
        {
                String s = "private String filterString(String code) {\n" +
"  // lets say code = \"some code //comment inside\"\n" +
"  // return the string \"some code\" (without the comment)\n}";

                s = s.replaceAll("//.*?\n","\n");
                System.out.println("s=" + s);

        }
}

The key is the line:关键是这条线:

s = s.replaceAll("//.*?\n","\n");

The regex //.*?\\n matches strings starting with // until the end of the line.正则表达式//.*?\\n匹配以//开始直到行尾的字符串。

And if you want to see this code in action, go here: http://www.ideone.com/e26Ve如果您想查看此代码的实际效果,请访问此处: http : //www.ideone.com/e26Ve

Hope it helps!希望能帮助到你!

To find the substring before a constant substring using a regular expression replacement is a bit much.使用正则表达式替换在常量子字符串之前查找子字符串有点多。

You can do it using indexOf() to check for the position of the comment start and substring() to get the first part, something like:您可以使用indexOf()检查注释 start 的位置并使用substring()来获取第一部分,例如:

String code = "some code // comment";
int    offset = code.indexOf("//");

if (-1 != offset) {
    code = code.substring(0, offset);
}

I made an open source library (on GitHub) for this purpose , its called CommentRemover you can remove single line and multiple line Java Comments.我为此创建了一个开源库(在 GitHub 上) ,它称为 CommentRemover,您可以删除单行和多行 Java 注释。

It supports remove or NOT remove TODO's.它支持删除或不删除 TODO。
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.它还支持 JavaScript 、 HTML 、 CSS 、 Properties 、 JSP 和 XML 注释。

Little code snippet how to use it (There is 2 type usage):小代码片段如何使用它(有 2 种用法):

First way InternalPath第一种方式InternalPath

 public static void main(String[] args) throws CommentRemoverException {

 // root dir is: /Users/user/Projects/MyProject
 // example for startInternalPath

 CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
        .removeJava(true) // Remove Java file Comments....
        .removeJavaScript(true) // Remove JavaScript file Comments....
        .removeJSP(true) // etc.. goes like that
        .removeTodos(false) //  Do Not Touch Todos (leave them alone)
        .removeSingleLines(true) // Remove single line type comments
        .removeMultiLines(true) // Remove multiple type comments
        .startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
        .setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
        .build();

 CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
                  commentProcessor.start();        
  }

Second way ExternalPath第二种方式 ExternalPath

 public static void main(String[] args) throws CommentRemoverException {

 // example for externalPath

 CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
        .removeJava(true) // Remove Java file Comments....
        .removeJavaScript(true) // Remove JavaScript file Comments....
        .removeJSP(true) // etc..
        .removeTodos(true) // Remove todos
        .removeSingleLines(false) // Do not remove single line type comments
        .removeMultiLines(true) // Remove multiple type comments
        .startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
        .setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
        .build();

 CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
                  commentProcessor.start();        
  }

@Christian Hujer has been correctly pointing out that many or all of the solutions posted fail if the comments occur within a string. @Christian Hujer 正确地指出,如果评论出现在字符串中,则发布的许多或所有解决方案都会失败。

@Loïc Gammaitoni suggests that his automata approach could easily be extended to handle that case. @Loïc Gammaitoni 建议他的自动机方法可以很容易地扩展到处理这种情况。 Here is that extension.这是扩展名。

enum State { outsideComment, insideLineComment, insideblockComment, insideblockComment_noNewLineYet, insideString };

public static String removeComments(String code) {
  State state = State.outsideComment;
  StringBuilder result = new StringBuilder();
  Scanner s = new Scanner(code);
  s.useDelimiter("");
  while (s.hasNext()) {
    String c = s.next();
    switch (state) {
      case outsideComment:
        if (c.equals("/") && s.hasNext()) {
          String c2 = s.next();
          if (c2.equals("/"))
            state = State.insideLineComment;
          else if (c2.equals("*")) {
            state = State.insideblockComment_noNewLineYet;
          } else {
            result.append(c).append(c2);
          }
        } else {
          result.append(c);
          if (c.equals("\"")) {
            state = State.insideString;
          }
        }
        break;
      case insideString:
        result.append(c);
        if (c.equals("\"")) {
          state = State.outsideComment;
        } else if (c.equals("\\") && s.hasNext()) {
          result.append(s.next());
        }
        break;
      case insideLineComment:
        if (c.equals("\n")) {
          state = State.outsideComment;
          result.append("\n");
        }
        break;
      case insideblockComment_noNewLineYet:
        if (c.equals("\n")) {
          result.append("\n");
          state = State.insideblockComment;
        }
      case insideblockComment:
        while (c.equals("*") && s.hasNext()) {
          String c2 = s.next();
          if (c2.equals("/")) {
            state = State.outsideComment;
            break;
          }
        }
    }
  }
  s.close();
  return result.toString();
}

It will be better if code handles single line comment and multi line comment separately .如果代码能分别处理单行注释和多行注释会更好。 Any suggestions ?有什么建议么 ?

    public class RemovingCommentsFromFile {

public static void main(String[] args) throws IOException {

    BufferedReader fin = new BufferedReader(new FileReader("/home/pathtofilewithcomments/File"));
    BufferedWriter fout = new BufferedWriter(new FileWriter("/home/result/File1"));


    boolean multilinecomment = false;
    boolean singlelinecomment = false;


    int len,j;
    String s = null;
    while ((s = fin.readLine()) != null) {

        StringBuilder obj = new StringBuilder(s);

        len = obj.length();

        for (int i = 0; i < len; i++) {
            for (j = i; j < len; j++) {
                if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '*') {
                    j += 2;
                    multilinecomment = true;
                    continue;
                } else if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '/') {
                    singlelinecomment = true;
                    j = len;
                    break;
                } else if (obj.charAt(j) == '*' && obj.charAt(j + 1) == '/') {
                    j += 2;
                    multilinecomment = false;
                    break;
                } else if (multilinecomment == true)
                    continue;
                else
                    break;
            }
            if (j == len)
            {
                singlelinecomment=false;
                break;
            }
            else
                i = j;

            System.out.print((char)obj.charAt(i));
            fout.write((char)obj.charAt(i));
        }
        System.out.println();
        fout.write((char)10);
    }
    fin.close();
    fout.close();

}

Easy solution that doesn't remove extra parts of code (like those above) // works for any reader, you can also iterate over list of strings instead简单的解决方案,不会删除额外的代码部分(如上面的代码)//适用于任何读者,您也可以遍历字符串列表

        String str="";
        String s;
        while ((s = reader.readLine()) != null)
        {
            s=s.replaceAll("//.*","\n");
            str+=s;
        }
        str=str.replaceAll("/\\*.*\\*/"," ");

for scanner, use a delimiter,对于扫描仪,使用分隔符,

delimiter example.分隔符示例。

import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;

public class MainClass {
  public static void main(String args[]) throws IOException {
FileWriter fout = new FileWriter("test.txt");
fout.write("2, 3.4,    5,6, 7.4, 9.1, 10.5, done");
fout.close();

FileReader fin = new FileReader("Test.txt");
Scanner src = new Scanner(fin);
// Set delimiters to space and comma.
// ", *" tells Scanner to match a comma and zero or more spaces as
// delimiters.

src.useDelimiter(", *");

// Read and sum numbers.
while (src.hasNext()) {
  if (src.hasNextDouble()) {
    System.out.println(src.nextDouble());
  } else {
    break;
  }
}
fin.close();
  }
}

Use a tokenizer for a normal string对普通字符串使用标记器

tokenizer:标记器:

// start with a String of space-separated words
String tags = "pizza pepperoni food cheese";

// convert each tag to a token
StringTokenizer st = new StringTokenizer(tags," ");

while ( st.hasMoreTokens() )
{
  String token = (String)st.nextToken();
  System.out.println(token);
}

http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM