简体   繁体   中英

Logstash Grok Pattern vs Python Regex?

I am trying to configure logstash to manage my various log sources, one of which is Mongrel2. The format used by Mongrel2 is tnetstring , where a log message will take the form

86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]

I want to write my own grok patterns to extract certain fields from the above format. I started by testing my regex on the above message here , the regex is

^(?:[^:]*\:){2}([^,]*)

this matches localhost . When I use the same regex as a grok pattern in the form

TEST ^(?:[^:]*\:){2}([^,]*)
MONGREL %{TEST:test}

and configure logstash with

filter {
  grok {
    match => [ "message", "%{MONGREL}" ]
  }
}

the same regex results in the match 86:9:localhost . I can't figure out where I am going wrong? Is is that the regex engine I was using to test is based on Python but the grok filter regex is based on Onigurama?

Currently testing it in grokdebug with the following input

86:9:localhost,12:192.168.33.1,5:57089#10:1411396297#3:GET,1:/,8:HTTP/1.1,3:200#6:145978#]

and the following pattern

(?<hostname>^(?:[^:]*\:){2}([^,]*))

resulting in

{
  "hostname": [
    [
      "86:9:localhost"
    ]
  ]
}

where I want

{
  "hostname": [
    [
      "localhost"
    ]
  ]
}

Give http://grokdebug.herokuapp.com/ a try. This is the best way to debug grok patterns that don't result in hair loss.

A pattern like this will extract the host name:

^(\d+)?:(\d+)?:(?<hostname>[^,]+),

Or writing it in a similar manner that you already wrote it:

^(?:[^:]*\:){2}(?<hostname>[^,]*)

The capture name needs to be inside the parenthesis that you want to capture... your pattern was capturing everything up to that point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM