简体   繁体   English

我需要改进我的 Powershell 正则表达式以找到特定 System.out.println 模式的 Java 代码

[英]I need to improve my Powershell Regular Expression to find Java codes for specific System.out.println patterns

We are trying to scan through a large library of files that have html, xml, and java files that can all include Java code for System.out.println.我们正在尝试扫描一个大型文件库,其中包含 html、xml 和 java 个文件,这些文件都可以包含 System.out.println 的 Java 代码。 The issue is I need to find a specific set of examples of just that part of the code.问题是我需要找到那部分代码的一组特定示例。

Example 1: System.out.println("my job code is: " var.jobcode);示例 1: System.out.println("my job code is: " var.jobcode);

Example 2: System.out.println("my jc is: " var.jc);示例 2: System.out.println("my jc is: " var.jc);

Example 3: System.out.println("my jbc is: " var.jbc);示例 3: System.out.println("my jbc is: " var.jbc);

I have tried to get this with the following:我试图通过以下方式获得此信息:

Get-ChildItem C:\my\folder\path -Recurse | Where-Object FullName -Match ".*C:\\my\\folder\\path*" | Where-Object FullName -Match ".*." | Select-String -Pattern '(System\.out\.println+(.*?job)\/?[^)]+[)]\s*;)|(System\.out\.println+(.*?jc)\/?[^)]+[)]\s*;)|(System\.out\.println+(.*?jbc)\/?[^)]+[)]\s*;){99}' -List | Select Path,Line

I got the files I wanted but I also get false positives so that files with the following lines are in the results by mistake.我得到了我想要的文件,但我也得到了误报,因此带有以下行的文件被错误地包含在结果中。

System.out.println ("component printout: item"); System.out.println ("");                 <td style="word-break: break-all;word-wrap:break-word;font-size:12px;" class="FONTSTYLE" align="left">Job Codes</td><td style="word-break: break-all;word-wrap:break-word;font-size:12px;" class="FONTSTYLE" align="left">

So anytime a file has a System.out.println();所以任何时候一个文件都有一个 System.out.println(); section followed by any word "job" that file gets picked up too when it shouldn't.部分后跟任何单词“工作”,该文件在不应该被拾取时也会被拾取。

I have to run this over several thousand files on a semi-regular basis and need to output the file path/name and line the offending code is in.我必须半定期地在数千个文件上运行它,并且需要 output 文件路径/名称和有问题的代码所在的行。

How can I clean up this Regex to be more specific to only include files with lines like my examples above but not pickup the other files?我怎样才能清理这个 Regex 以更具体地只包含像我上面的例子那样的行的文件而不是拾取其他文件?

Some notes about the pattern that you tried:关于您尝试的模式的一些注释:

  • You have 3 alternations, where the only difference is the word that should be present.你有 3 个交替,唯一的区别是应该出现的词。 You can use a single pattern with an alternation for those words in a non capture group instead您可以对非捕获组中的这些词使用带有交替的单一模式
  • Using println+ matches printl followed by 1 or more times an n char使用println+匹配printl后跟 1 次或多次n字符
  • The non greedy dot .*?非贪心点.*? can possibly over match, as the dot can also match " and )可能过度匹配,因为点也可以匹配")
  • The quantifier {99} repeats the whole grouping part exactly 99 times for the last alternation which seems a bit off in the pattern.量词{99}将整个分组部分重复了 99 次,最后一次交替似乎有点偏离模式。

You might make the pattern a bit more specific:您可能会使模式更具体一些:

System\.out\.println\("[^":]*\s(?:job|jb?c)\s[^":]*:[^"]*"[^)]*\);

Explanation解释

  • System\.out\.println\( Match System.out.println( System\.out\.println\(匹配System.out.println(
  • "[^":]* Match " and then optional chars other than " and : "[^":]*匹配" ,然后匹配"和 以外的可选字符:
  • \s(?:job|jb?c)\s Match either job jbc or jc between whitespace chars (Or use word boundaries \b(?:job|jb?c)\b ) \s(?:job|jb?c)\s在空白字符之间匹配job jbcjc (或使用单词边界\b(?:job|jb?c)\b
  • [^":]*:[^"]*" Optionally match any char other than " and : , then match : followed by any char except " [^":]*:[^"]*"可选择匹配除":以外的任何字符,然后匹配:后跟除"之外的任何字符
  • [^)]*\); Match optional chars other than ) , then match ) and ;匹配)以外的可选字符,然后匹配);

See a regex demo .请参阅正则表达式演示

An alternative without a mandatory : and word boundaries:没有强制性:和单词边界的替代方案:

System\.out\.println\("[^":]*\b(?:job|jb?c)\b[^"]*"[^)]*\);

See another regex demo .请参阅另一个正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM