简体   繁体   中英

Regex to parse CustomLog format (PHP)

I am trying to parse a CustomLog format in this format:

LogFormat "%v %{X-Forwarded-For}i %h %l %u %t \"%r\" %>s %b" MyCustomLog

This is how the entry looks - note that there is a comma delimiting the IP's passed in the X-Forwarded-For header.

my.server.com 24.24.24.3, 1.2.3.4 1.2.3.5 - - [18/May/2016:02:57:25 -0400] "GET /veer/eye?params=1&are=2&right=3&here=4 HTTP/1.1" 200 146351

I want to capture the following fields:

  • x-forward-for IP's (comma delimited)
  • remote hostname
  • remote logname (may be -)
  • remote user (may be -)
  • timestamp in [ ] block
  • the request url (in the quotes)
  • the response size (the last value)

I am a bit rusty with regex - at least in the sense of negative lookaheads which is what i think i need to use?

Help is appreciated!

This is a more complete pattern that should work for you. I break everything out as part of a group more completely and even added names for the groups. It matches both the example found in your question and the one in the comments.

Demo: https://3v4l.org/jMKFL

<?php
$pattern = '/(?P<hostname>[\w\.]+) '
         . '(?P<forwardedFor>(?:[\d\.]+, )*(?:[\d\.]+)|-) '
         . '(?P<remoteHostname>[\d\.]+) '
         . '(?P<remoteLogname>[^\s]+) '
         . '(?P<remoteUsername>[^\s]+) '
         . '\['
            . '(?P<requestDate>[^\]]+)'
         . '\] '
         . '"'
            . '(?P<method>\w+) '
            . '(?P<uri>[^\s]+) '
            . '(?<httpVersion>[^\"]+)'
         . '" '
         . '(?P<responseStatus>\d+) '
         . '(?P<responseSize>\d+)/';

$test = 'my.server.com 24.24.24.3, 1.2.3.4 1.2.3.5 - - [18/May/2016:02:57:25 -0400] "GET /veer/eye?params=1&are=2&right=3&here=4 HTTP/1.1" 200 146351';
$test2 = 'qa-test.test.com - 80.82.65.120 - - [18/May/2016:00:30:20 -0400] "GET // HTTP/1.1" 404 198';

preg_match($pattern, $test, $matches);
print_r($matches);

preg_match($pattern, $test2, $matches);
print_r($matches);

Outputs:

Array
(
    [0] => my.server.com 24.24.24.3, 1.2.3.4 1.2.3.5 - - [18/May/2016:02:57:25 -0400] "GET /veer/eye?params=1&are=2&right=3&here=4 HTTP/1.1" 200 146351
    [hostname] => my.server.com
    [1] => my.server.com
    [forwardedFor] => 24.24.24.3, 1.2.3.4
    [2] => 24.24.24.3, 1.2.3.4
    [remoteHostname] => 1.2.3.5
    [3] => 1.2.3.5
    [remoteLogname] => -
    [4] => -
    [remoteUsername] => -
    [5] => -
    [requestDate] => 18/May/2016:02:57:25 -0400
    [6] => 18/May/2016:02:57:25 -0400
    [method] => GET
    [7] => GET
    [uri] => /veer/eye?params=1&are=2&right=3&here=4
    [8] => /veer/eye?params=1&are=2&right=3&here=4
    [httpVersion] => HTTP/1.1
    [9] => HTTP/1.1
    [responseStatus] => 200
    [10] => 200
    [responseSize] => 146351
    [11] => 146351
)
Array
(
    [0] => test.test.com - 80.82.65.120 - - [18/May/2016:00:30:20 -0400] "GET // HTTP/1.1" 404 198
    [hostname] => test.test.com
    [1] => test.test.com
    [forwardedFor] => -
    [2] => -
    [remoteHostname] => 80.82.65.120
    [3] => 80.82.65.120
    [remoteLogname] => -
    [4] => -
    [remoteUsername] => -
    [5] => -
    [requestDate] => 18/May/2016:00:30:20 -0400
    [6] => 18/May/2016:00:30:20 -0400
    [method] => GET
    [7] => GET
    [uri] => //
    [8] => //
    [httpVersion] => HTTP/1.1
    [9] => HTTP/1.1
    [responseStatus] => 404
    [10] => 404
    [responseSize] => 198
    [11] => 198
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM