简体   繁体   English

为什么我不能在子字符串中捕获多个数字?

[英]Why can't I capture more than one digit in substring?

I am creating regex to extract various fields from logs files. 我正在创建正则表达式以从日志文件中提取各种字段。 I have created one regex using some tools and its almost complete. 我使用一些工具创建了一个正则表达式,它几乎完整。 the only thing is for one field its extracting only one digit instead of full number. 唯一的问题是对于一个字段,它只提取一位数字而不是整数。 for better understanding I have saved it to below link. 为了更好地理解,我将其保存到以下链接。

My Regex Demo 我的正则表达式演示

Pattern: 图案:

/(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew))^(?:).*(?P<ParNew_before_1>\d)K\->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)/

String: 串:

146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080K), 0.0320299 secs] [Times: user=0.32 sys=0.01, real=0.03 secs]

Current Output: 电流输出:

Full match      `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`
Group `ParNew_before_1`     `3`
Group `ParNew_after_1`      `88155`
Group `young_heap_size`     `419456`
Group `par_new_duration`    `0.0313803`
Group `ParNew_before_2`     `9893391`
Group `ParNew_after_2`      `9602913`
Group `total_heap_size`     `12478080`

Expected Output: 预期产量:

Full match      `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`

Group ParNew_before_1 378633 ParNew_before_1 378633

Group `ParNew_after_1`      `88155`
Group `young_heap_size`     `419456`
Group `par_new_duration`    `0.0313803`
Group `ParNew_before_2`     `9893391`
Group `ParNew_after_2`      `9602913`
Group `total_heap_size`     `12478080`

In above example: Group ParNew_before_1 extracting only one digit. 在上面的示例中:组ParNew_before_1仅提取一位数字。

There are three things I'd like to note here: 我想在这里注意三件事:

  • The lookahead should be placed after ^ (it will make more sense to check its pattern at the start of the string only) 前行应放在^之后(仅在字符串开头检查其模式会更有意义)
  • The \\d won't match more than 1 digit, add + after it to match 1 or more \\d不能匹配超过1个数字,请在其后添加+以匹配1个或多个
  • .* is too greedy, use lazy .*? .*太贪婪,请使用懒惰的.*? .

Use 采用

^(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew)).*?(?P<ParNew_before_1>\d+)K->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)
 ^^^                                           ^  ^                      ^

See this regex demo 观看此正则表达式演示

Also, you do not need to escape - that are not inside character classes. 另外,您不需要转义-不在字符类内。

As an aside when you have a long pattern, do not hesitate to use the x modifier (for the "free-spacing" mode) and eventually the quoting feature \\Q..\\E (to figure spaces and special character without escaping them) to make it more readable: 顺便说一句,当您使用长图案时,请不要犹豫使用x修饰符(对于“自由间距”模式) ,最后使用引号功能\\Q..\\E (在不转义的情况下计算空格和特殊字符)使其更具可读性:

/
^
(?=
    [^PD\n]* (?>[PD][^\nPD]*)*? \b
    (?: ParNew | PSYoungGen | DefNew )
)
[^\n\d]* (?>\d+[^\n\d]+)*? \b
(?<ParNew_before_1>  \d+      ) K->
(?<ParNew_after_1>   \d+      ) \QK(\E
(?<young_heap_size>  \d+      ) \QK), \E
(?<par_new_duration> \d+\.\d+ ) \Q secs] \E
(?<ParNew_before_2>  \d+      ) K->
(?<ParNew_after_2>   \d+      ) \QK(\E
(?<total_heap_size>  \d+      )
/x

demo 演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM