简体   繁体   中英

I need to improve my Powershell Regular Expression to find Java codes for specific System.out.println patterns

We are trying to scan through a large library of files that have html, xml, and java files that can all include Java code for System.out.println. The issue is I need to find a specific set of examples of just that part of the code.

Example 1: System.out.println("my job code is: " var.jobcode);

Example 2: System.out.println("my jc is: " var.jc);

Example 3: System.out.println("my jbc is: " var.jbc);

I have tried to get this with the following:

Get-ChildItem C:\my\folder\path -Recurse | Where-Object FullName -Match ".*C:\\my\\folder\\path*" | Where-Object FullName -Match ".*." | Select-String -Pattern '(System\.out\.println+(.*?job)\/?[^)]+[)]\s*;)|(System\.out\.println+(.*?jc)\/?[^)]+[)]\s*;)|(System\.out\.println+(.*?jbc)\/?[^)]+[)]\s*;){99}' -List | Select Path,Line

I got the files I wanted but I also get false positives so that files with the following lines are in the results by mistake.

System.out.println ("component printout: item"); System.out.println ("");                 <td style="word-break: break-all;word-wrap:break-word;font-size:12px;" class="FONTSTYLE" align="left">Job Codes</td><td style="word-break: break-all;word-wrap:break-word;font-size:12px;" class="FONTSTYLE" align="left">

So anytime a file has a System.out.println(); section followed by any word "job" that file gets picked up too when it shouldn't.

I have to run this over several thousand files on a semi-regular basis and need to output the file path/name and line the offending code is in.

How can I clean up this Regex to be more specific to only include files with lines like my examples above but not pickup the other files?

Some notes about the pattern that you tried:

  • You have 3 alternations, where the only difference is the word that should be present. You can use a single pattern with an alternation for those words in a non capture group instead
  • Using println+ matches printl followed by 1 or more times an n char
  • The non greedy dot .*? can possibly over match, as the dot can also match " and )
  • The quantifier {99} repeats the whole grouping part exactly 99 times for the last alternation which seems a bit off in the pattern.

You might make the pattern a bit more specific:

System\.out\.println\("[^":]*\s(?:job|jb?c)\s[^":]*:[^"]*"[^)]*\);

Explanation

  • System\.out\.println\( Match System.out.println(
  • "[^":]* Match " and then optional chars other than " and :
  • \s(?:job|jb?c)\s Match either job jbc or jc between whitespace chars (Or use word boundaries \b(?:job|jb?c)\b )
  • [^":]*:[^"]*" Optionally match any char other than " and : , then match : followed by any char except "
  • [^)]*\); Match optional chars other than ) , then match ) and ;

See a regex demo .

An alternative without a mandatory : and word boundaries:

System\.out\.println\("[^":]*\b(?:job|jb?c)\b[^"]*"[^)]*\);

See another regex demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM