I'm writing a batch file that processes a log file of my application.
The log file may contain messages whose start match the regex ^.{24}\\[ERROR
followed by some consecutive lines that I need to find. The end of a log message will be denoted by the next match of the regex ^.{24}\\[[AZ
Currently I'm using the Regex (?m)^.{24}\\[ERROR(.*\\r?\\n?.)*?^.{24}\\[[AZ]
to find such messages. But the performance is very poor as it is currently already running multiple minutes for a few MB log file.
The complete batch file I'm using is:
@Echo off
powershell -Command "& {[System.Text.RegularExpressions.RegEx]::Matches([System.IO.File]::ReadAllText('application.log'), '(?m)^.{24}\[ERROR(.*\r?\n?.)*?^.{24}\[[A-Z]') | Set-Content result.txt}"
What regex should I use to match the log messages as described above?
The point is that your regex contains a (.*\\r?\\n?.)*?
section inside, containing nested optional (that is, matching an empty text) subpatterns. Once quantified in a group, they have the regex engine try a lot of combinations before admitting there is no match, thus, leading to catastrophical backtracking or timeout issues.
One of the solutions is just to use lazy dot matching pattern with the DOTALL modifier:
(?ms)^.{24}\[ERROR(.*?)^.{24}\[[A-Z]
See the regex demo
The .NET regex engine handles the subpattern much better than PCRE, Python re, JavaScript.
However, lazy matching costs performance, and it is best practice to unroll it. I suggest
(?m)^.{24}\[ERROR(.*(?:\n(?!.{24}\[[A-Z]).*)*)\n.{24}\[[A-Z]
Note that these 2 are equivalent in what they match, but differ in how they match. While the first tries to match the trailing part of the pattern and expanding 1 char by one upon failure, the unrolled pattern just grabs text portions up to a newline, and all newlines that have no 24 non-newline symbols followed with [
and an uppercase ASCII letter, which is faster .
RegexHero.net test:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.