简体   繁体   中英

Fluentd apache log format with multiple host ip

I have a little issue with fluend log parser. I have varnish server on which I have set up the X-Forwarded-For parameter to content the list of ip all the host stack a http request goes through. I use this to get information in varnishncsa logs. This is and example of log :

"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""

In the oder hand I would like to aggregate these logs on fluentd. Then as vanishncsa logs use the apache format, I use the apache2 flentd format for input parsing, like in this configuration :

<source>
  type tail
  format apache2
  path /var/log/varnish/varnishncsa.log
  pos_file /var/log/td-agent/tmp/access.log.pos
  tag "apache2.varnish.mydomain.com.access"
</source>

Now the problem is that this work when if I have only one host ip in the log, but when there multiple IPs, the fluentd agregator report a "pattern not match" warning. I mean

This matches :

"192.168.79.16 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""

This doesn't match :

"192.168.79.16, 192.22.10.22, 10.2.2.22 - - [13/Aug/2015:09:50:45 +0000] \"GET http://poc.mydomain.com/panier/payment/payline?notificationType=WEBTRS&token=1KB01BwKWdUhVj1222301439454223514 HTTP/1.1\" 401 0 \"-\" \"Java/1.8.0_45\""

The apache2 fluentd regex is :

^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$

With this time format :

%d/%b/%Y:%H:%M:%S %z

I try to find out and text the right regx for that, but not found yet.

I tried this but, it doesn't work

 <source>
      type tail
      format format /^(?<host>\,*[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/ 
      time_format %d/%b/%Y:%H:%M:%S %z
      path /var/log/varnish/varnishncsa.log
      pos_file /var/log/td-agent/tmp/access.log.pos
      tag "apache2.varnish.mydomain.com.access"
    </source>

Can someone help? And also give me a good documentaion on fluend parser pattern capturing, and a good way to the test fulentd regex. This Fluentd regular expression editor doesn't really help.

It always generate configuration, without giving a test result.

Thanks.

Here is the regex you can use in case you have multiple IPs:

^(?<host>[^ ]*(?:,\s+[^ ]+)*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
              ^^^^^^^^^^^^^^

See demo on a good Web regex tester

The (?:,\\s+[^ ]+)* pattern matches 0 or more ( * ) sequences of , , 1 or more whitespace ( \\s+ ) symbols, and 1 or more characters other than space ( [^ ]+ ).

A bit safer expression will look like:

^(?<host>(?:\d+\.){3}\d+(?:,\s*(?:\d+\.){3}\d+)*|-) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$

See Demo 2

The (?:\\d+\\.){3}\\d+(?:,\\s*(?:\\d+\\.){3}\\d+)* matches number + . + number + . + number + . + number , with optional identical patterns listed with a comma.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM