简体   繁体   English

正则表达式将扩展的通用日志格式与多个主机匹配

[英]Regex Match Extended Common Log Format with Multiple Host

I am trying to write a regex expression to match the extended common log format. 我正在尝试编写一个正则表达式来匹配扩展的通用日志格式。 I have an expression to match most of the log entries, but it fails when multiple hosts are listed. 我有一个表达式可以匹配大多数日志条目,但是当列出多个主机时,它会失败。

Here is my current expression: 这是我目前的表情:

([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*) "([^"]*)" "([^"]*)"

This successfully matches standard log entries. 这成功匹配标准日志条目。 For example: 例如:

24.58.227.240 - - [22/Sep/2011:00:00:00 +0000] "GET /rss/merchant/airsoftpost.com HTTP/1.1" 200 1880 "-" "Apple-PubSub/65"

However, some of the log entries contain multiple host IPs separated by commas: 但是,某些日志条目包含多个主机IP(以逗号分隔):

10.64.233.43, 69.171.229.245 - - [22/Sep/2011:00:00:00 +0000] "GET /view/thesanctuary.co.uk HTTP/1.1" 206 7289 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"

Could someone help me fix my expression to match any number of hosts for a given log item? 有人可以帮我修复表达式,以匹配给定日志项目的任意数量的主机吗?

Thanks. 谢谢。

Following your regex, you can change: 遵循正则表达式,您可以更改:

([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*) "([^"]*)" "([^"]*)"

To

([^-]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*) "([^"]*)" "([^"]*)"
   ^--- here, match until first dash

The idea is to change only the first group: 这个想法是只更改第一组:

([^ ]*)   ---> matches until the first space (change this)
([^-]*)   ---> matches until the first hyphen

As an option, try this regex pattern: 作为一种选择,尝试以下正则表达式模式:

([\\d.\\s,]*) ([^ ]*) ([^ ]*) \\[([^]]*)\\] "([^"]*)" ([^ ]*) ([^ ]*) "([^"]*)" "([^"]*)"

The first capturing group will now capture all digits, periods, (white)spaces, any number of repetitions. 现在,第一个捕获组将捕获所有数字,句点,(空格),任意数量的重复。

See working demo . 请参阅工作演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM