[英]Why can't I capture more than one digit in substring?
I am creating regex to extract various fields from logs files. 我正在创建正则表达式以从日志文件中提取各种字段。 I have created one regex using some tools and its almost complete.
我使用一些工具创建了一个正则表达式,它几乎完整。 the only thing is for one field its extracting only one digit instead of full number.
唯一的问题是对于一个字段,它只提取一位数字而不是整数。 for better understanding I have saved it to below link.
为了更好地理解,我将其保存到以下链接。
Pattern: 图案:
/(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew))^(?:).*(?P<ParNew_before_1>\d)K\->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)/
String: 串:
146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080K), 0.0320299 secs] [Times: user=0.32 sys=0.01, real=0.03 secs]
Current Output: 电流输出:
Full match `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`
Group `ParNew_before_1` `3`
Group `ParNew_after_1` `88155`
Group `young_heap_size` `419456`
Group `par_new_duration` `0.0313803`
Group `ParNew_before_2` `9893391`
Group `ParNew_after_2` `9602913`
Group `total_heap_size` `12478080`
Expected Output: 预期产量:
Full match `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`
Group
ParNew_before_1
378633
组
ParNew_before_1
378633
Group `ParNew_after_1` `88155`
Group `young_heap_size` `419456`
Group `par_new_duration` `0.0313803`
Group `ParNew_before_2` `9893391`
Group `ParNew_after_2` `9602913`
Group `total_heap_size` `12478080`
In above example: Group ParNew_before_1
extracting only one digit. 在上面的示例中:组
ParNew_before_1
仅提取一位数字。
There are three things I'd like to note here: 我想在这里注意三件事:
^
(it will make more sense to check its pattern at the start of the string only) ^
之后(仅在字符串开头检查其模式会更有意义) \\d
won't match more than 1 digit, add +
after it to match 1 or more \\d
不能匹配超过1个数字,请在其后添加+
以匹配1个或多个 .*
is too greedy, use lazy .*?
.*
太贪婪,请使用懒惰的.*?
. Use 采用
^(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew)).*?(?P<ParNew_before_1>\d+)K->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)
^^^ ^ ^ ^
See this regex demo 观看此正则表达式演示
Also, you do not need to escape -
that are not inside character classes. 另外,您不需要转义
-
不在字符类内。
As an aside when you have a long pattern, do not hesitate to use the x modifier (for the "free-spacing" mode) and eventually the quoting feature \\Q..\\E
(to figure spaces and special character without escaping them) to make it more readable: 顺便说一句,当您使用长图案时,请不要犹豫使用x修饰符(对于“自由间距”模式) ,最后使用引号功能
\\Q..\\E
(在不转义的情况下计算空格和特殊字符)使其更具可读性:
/
^
(?=
[^PD\n]* (?>[PD][^\nPD]*)*? \b
(?: ParNew | PSYoungGen | DefNew )
)
[^\n\d]* (?>\d+[^\n\d]+)*? \b
(?<ParNew_before_1> \d+ ) K->
(?<ParNew_after_1> \d+ ) \QK(\E
(?<young_heap_size> \d+ ) \QK), \E
(?<par_new_duration> \d+\.\d+ ) \Q secs] \E
(?<ParNew_before_2> \d+ ) K->
(?<ParNew_after_2> \d+ ) \QK(\E
(?<total_heap_size> \d+ )
/x
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.