简体   繁体   中英

Logstash Filtering and Parsing Dies Output

Environment

  • Ubuntu 16.04
  • Logstash 5.2.1
  • ElasticSearch 5.1

I've configured our Deis platform to send logs to our Logstack node with no issues. However, I'm still new to Ruby and Regexes are not my strong suit.

Log Example :

2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n

Logstash Configuration:

input {
    tcp {
        port => 5000
        type => syslog
        codec => plain
    }
    udp {
        port => 5000
        type => syslog
        codec => plain
    }
}

filter {
    json {
        source => "syslog_message"
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}

Elasticsearch output:

"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n",
"type" => "json"

Desired outcome:

"@timestamp" => 2017-02-15T14:55:24.408Z,
"@version" => "1",
"host" => "x.x.x.x",
"type" => "json"
"container" => "deis-logspout"
"severity level" => "Info"
"message" => "routing all to udp://x.x.x.x:xxxx\n"

How can I extract the information out of the message into their individual fields?

Unfortunately your assumptions about what you are trying to do is slightly off, but we can fix that!

You created a regex for JSON, but you are not parsing JSON. You are simply parsing a log that is bastardized syslog (see syslogStreamer in the source ), but is not in fact syslog format (either RFC 5424 or 3164). Logstash afterwards provides JSON output.

Let's break down the message, which becomes the source that you parse. The key is you have to parse the message front to back.

Message:

2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx\n
  • 2017-02-15T14:55:24UTC : Timestamp is a common grok pattern. This mostly follows TIMESTAMP_ISO8601 but not quite.
  • deis-logspout[1] : This would be your logsource, which you can name container. You can use the grok pattern URIHOST .
  • routing all to udp://xxxx:xxxx\\n : Since the message for most logs is contained at the end of the message, you can just then use the grok pattern GREEDYDATA which is the equivalent of .* in a regular expression.
  • 2017/02/15 14:55:24 : Another timestamp (why?) that doesn't match common grok patterns.

With grok filters, you can map a syntax (abstraction from regular expressions) to a semantic (name for the value that you extract). For example %{URIHOST:container}

You'll see I did some hacking together of the grok filters to make the formatting work. You have match parts of the text, even if you don't intend to capture the results. If you can't change the formatting of the timestamps to match standards, create a custom pattern.

Configuration:

input {
    tcp {
        port => 5000
        type => deis
    }
    udp {
        port => 5000
        type => deis
    }
}

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}(UTC|CST|EST|PST) %{URIHOST:container}\[%{NUMBER}\]: %{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME} %{GREEDYDATA:msg}" }
    }
}

output {
    elasticsearch { hosts => ["foo.somehost"] }
}

Output:

{
    "container" => "deis-logspout",
    "msg" => "routing all to udp://x.x.x.x:xxxx",
    "@timestamp" => 2017-02-22T23:55:28.319Z,
    "port" => 62886,
    "@version" => "1",
    "host" => "10.0.2.2",
    "message" => "2017-02-15T14:55:24UTC deis-logspout[1]: 2017/02/15 14:55:24 routing all to udp://x.x.x.x:xxxx",
    "timestamp" => "2017-02-15T14:55:24"
    "type" => "deis"
}

You can additionally mutate the item to drop @timestamp, @host, etc. as these are provided by Logstash by default. Another suggestion is to use the date filter to convert any timestamps found into usable formats (better for searching).

Depending on the log formatting, you may have to slightly alter the pattern. I only had one example to go off of. This also maintains the original full message, because any field operations done in Logstash are destructive (they overwrite values with fields of the same name).

Resources:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM