替换Java源文件中的开始注释

Question

I'm writing ac# program to update the starting comment -that is commonly the license header- of java source code. 我正在编写ac＃程序以更新起始注释（通常是Java源代码的许可证标头）。 The following snippet do the job. 以下代码段可以完成此任务。

                foreach (string r in allfiles)
                {
                    // GC.Collect();
                    string thefile = System.IO.File.ReadAllText(r);
                    var pattern = @"/\*(?s:.*?)\*/[\s\S]*?package";
                    Regex regex1 = new Regex(pattern /*,RegexOptions.Compiled */) ;
                    var replaced = regex1.Replace(thefile, newheader + "package");
                    System.IO.File.WriteAllText(r, replaced);
                }

The problem is that after hundreds of source file processed the process hang at .Replace 问题是在处理了数百个源文件之后，该进程挂在.Replace上。

It's not a matter of Garbage Collection as forcing it don't solve the issue. 这不是垃圾收集的问题，因为强迫它不能解决问题。 And doesn't matter if RegexOptions.Compiled or not. 与RegexOptions.Compiled与否无关紧要。

I'm quite sure it depends on an issue in the pattern as the hanging appear on some files that -if removed from processing- let the job continue till the end of one thousand of source file. 我非常确定，这取决于模式中的问题，因为挂起出现在某些文件上-如果从处理中删除了这些文件，则该工作将继续进行直到一千个源文件结束。 But if I process these files alone, it work and also work if I use an online testing tool as http://regexstorm.net/tester https://www.myregextester.com/index.php 但是，如果我仅处理这些文件，则在使用在线测试工具（例如http://regexstorm.net/tester https://www.myregextester.com/index.php）时也可以工作

Please let me know if there is any way to optimize better the search pattern for finding the first Java comment in a file. 请让我知道是否有任何方法可以更好地优化搜索模式，以在文件中查找第一个Java注释。

Thank you in advance. 先感谢您。

Answer 1

Your regex contains 2 bottlenecks related to lazy dot matching ( . in singleline mode and [\\s\\S]*? are synonyms). 您正则表达式中包含2周与懒点匹配的瓶颈（ .在单线模式和[\\s\\S]*?是同义词）。 The backtracking buffer may get easily and quickly overrun when running a regex against big files. 当对大文件运行正则表达式时，回溯缓冲区可能会轻松快速地被溢出。

The common technique is to unroll/unwrap the construct with the negated character class and a quantified group. 常见的技术是使用否定的字符类和量化的组来展开/展开构造。

You may use 您可以使用

@"/\*[^*]*(?:\*(?!/)[^*]*)*\*/\s*package"

See regex demo 见正则表达式演示

The regex breakdown: 正则表达式细分：

/\\* - literal /* /\\* -文字/*
[^*]* - 0 or more characters other than * [^*]* -除*以外的0个或更多字符
(?:\\*(?!/)[^*]*)* - the unrolled variant of (?s:.*?) , matching 0 or more sequences of... (?:\\*(?!/)[^*]*)* - (?s:.*?)的展开变体，匹配0个或多个序列...
- \\*(?!/) - a * symbol not followed by a / \\*(?!/) -一个*符号，后跟/
- [^*]* - 0 or more symbols other than * [^*]* -除*以外的0个或多个符号
\\*/ - a literal sequence of */ \\*/ -的字面顺序*/
\\s* - 0 or more whitespace characters \\s* -0个或更多空白字符
package - literal letter sequence package package -文字字母序列package

替换Java源文件中的开始注释

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-11-09 15:35:02

替换Java源文件中的开始注释

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-11-09 15:35:02

解决方案1
0 已采纳 2015-11-09 15:35:02