简体   繁体   中英

Replace starting comment in Java source file

I'm writing ac# program to update the starting comment -that is commonly the license header- of java source code. The following snippet do the job.

                foreach (string r in allfiles)
                {
                    // GC.Collect();
                    string thefile = System.IO.File.ReadAllText(r);
                    var pattern = @"/\*(?s:.*?)\*/[\s\S]*?package";
                    Regex regex1 = new Regex(pattern /*,RegexOptions.Compiled */) ;
                    var replaced = regex1.Replace(thefile, newheader + "package");
                    System.IO.File.WriteAllText(r, replaced);
                }

The problem is that after hundreds of source file processed the process hang at .Replace

It's not a matter of Garbage Collection as forcing it don't solve the issue. And doesn't matter if RegexOptions.Compiled or not.

I'm quite sure it depends on an issue in the pattern as the hanging appear on some files that -if removed from processing- let the job continue till the end of one thousand of source file. But if I process these files alone, it work and also work if I use an online testing tool as http://regexstorm.net/tester https://www.myregextester.com/index.php

Please let me know if there is any way to optimize better the search pattern for finding the first Java comment in a file.

Thank you in advance.

Your regex contains 2 bottlenecks related to lazy dot matching ( . in singleline mode and [\\s\\S]*? are synonyms). The backtracking buffer may get easily and quickly overrun when running a regex against big files.

The common technique is to unroll/unwrap the construct with the negated character class and a quantified group.

You may use

@"/\*[^*]*(?:\*(?!/)[^*]*)*\*/\s*package"

See regex demo

The regex breakdown:

  • /\\* - literal /*
  • [^*]* - 0 or more characters other than *
  • (?:\\*(?!/)[^*]*)* - the unrolled variant of (?s:.*?) , matching 0 or more sequences of...
    • \\*(?!/) - a * symbol not followed by a /
    • [^*]* - 0 or more symbols other than *
  • \\*/ - a literal sequence of */
  • \\s* - 0 or more whitespace characters
  • package - literal letter sequence package

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM