简体   繁体   English

如何在Logstash中编写grok模式

[英]How to write grok pattern in logstash

I am trying to start with logstash and my application has following type of logs. 我正在尝试从logstash开始,并且我的应用程序具有以下类型的日志。 Here 5 indicate 5 more lines will follow which are stats collected for different related things. 这里5表示将跟随5行,这是针对不同相关事物收集的统计信息。

These are basically application stats with each line indicating about one of the resource. 这些基本上是应用程序统计信息,每行指示大约一个资源。

Is there a way to properly parse it using logstash so that it can be use for Elastic search? 有没有一种方法可以使用logstash正确解析它,以便可以将其用于弹性搜索?

[20170502 01:57:26.209 EDT (thread-name) package-name.classname#MethodName INFO] Some info line (5 stats):
[fieldA: strvalue1| field2: 0 | field3: 0 | field4: 0 | field5: 0 | field6: 0 | field7: 0]
[fieldA: strvalue2| field2: 0 | field3: 0 | field4: 0 | field5: 0 | field6: 0 | field7: 0]
[fieldA: strvalue3| field2: 0 | field3: 0 | field4: 0 | field5: 0 | field6: 0 | field7: 0]
[fieldA: strvalue4| field2: 0 | field3: 0 | field4: 0 | field5: 0 | field6: 0 | field7: 0]
[fieldA: strvalue5| field2: 0 | field3: 0 | field4: 0 | field5: 0 | field6: 0 | field7: 0]

EDIT : 编辑

This is the configuration I am using, with this first set of stats are getting parsed properly but after that pipeline get stuck. 这是我正在使用的配置,第一组统计信息已正确解析,但在该管道阻塞之后。 Please note there 150 such logs but if I keep only 2-3 then it works fine. 请注意有150个这样的日志,但是如果我只保留2-3个日志,则可以正常工作。 Can you please help me identifying issue here? 您能帮我在这里找到问题吗?

# [20170513 06:08:29.734 EDT (StatsCollector-1) deshaw.tools.jms.ActiveMQLoggingPlugin$ActiveMQDestinationStatsCollector#logPerDestinationStats INFO] ActiveMQ Destination Stats (97 destinations):
# [destName: topic://darts.metaDataChangeTopic | enqueueCount: 1 | dequeueCount: 1 | dispatchCount: 1 | expiredCount: 0 | inflightCount: 0 | msgsHeld: 0 | msgsCached: 0 | memoryPercentUsage: 0 | memoryUsage: 0 | memoryLimit: 536870912 | avgEnqueueTimeMs: 0.0 | maxEnqueueTimeMs: 0 | minEnqueueTimeMs: 0 | currentConsumers: 1 | currentProducers: 0 | blockedSendsCount: 0 | blockedSendsTimeMs: 0 | minMsgSize: 2392 | maxMsgSize: 2392 | avgMsgSize: 2392.0 | totalMsgSize: 2392]

input {
  file {
    path => "/u/bansalp/activemq_primary_plugin.stats.log.1"
### For testing and continual process of the same file, remove these before produciton
    start_position => "beginning"
    sincedb_path => "/dev/null"
### Lets read the logfile and recombine multi line details
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^\[destName:"
      negate => false
      what => "previous"
    }
  }
}

filter {
    if ([message] =~ /^\s*$/ ){
        drop{}
    }
    if ([message] =~ /^[^\[]/) {
            drop{}
    }

    if ([message] =~ /logMemoryInfo|logProcessInfo|logSystemInfo|logThreadBreakdown|logBrokerStats/) {
            drop{}
    }
    if [message] =~ "logPerDestinationStats" {
        grok {
                match => { "message" => "^\[%{YEAR:yr}%{MONTHNUM:mnt}%{MONTHDAY:daynum}\s*%{TIME:time}\s*%{TZ:timezone}\s*(%{DATA:thread_name})\s*%{JAVACLASS:javaclass}#%{WORD:method}\s*%{LOGLEVEL}\]\s*"
                }
        }
        split { 
            field => "message"
        }
        grok {
                match => { "message" => "^\[%{DATA}:\s*%{DATA:destName}\s*\|\s*%{DATA}:\s*%{NUMBER:enqueueCount}\s*\|\s*%{DATA}:\s*%{NUMBER:dequeueCount}\s*\|\s*%{DATA}:\s*%{NUMBER:dispatchCount}\s*\|\s*%{DATA}:\s*%{NUMBER:expiredCount}\s*\|\s*%{DATA}:\s*%{NUMBER:inflightCount}\s*\|\s*%{DATA}:\s*%{NUMBER:msgsHeld}\s*\|\s*%{DATA}:\s*%{NUMBER:msgsCached}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryPercentUsage}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryUsage}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryLimit}\s*\|\s*%{DATA}:\s*%{NUMBER:avgEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:maxEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:minEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:currentConsumers}\s*\|\s*%{DATA}:\s*%{NUMBER:currentProducers}\s*\|\s*%{DATA}:\s*%{NUMBER:blockedSendsCount}\s*\|\s*%{DATA}:\s*%{NUMBER:blockedSendsTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:minMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:maxMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:avgMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:totalMsgSize}\]$" }
        }
        mutate {
            convert => { "message" => "string" }
            add_field => {
                "session_timestamp" => "%{yr}-%{mnt}-%{daynum} %{time} %{timezone}"
                "load_timestamp" => "%{@timestamp}"
            }
            remove_field => ["yr","mnt", "daynum", "time", "timezone"]
        }
    }
}
output {
  stdout {codec => rubydebug}
}

There certainly is. 当然有。

What you will need to do is utilise the multiline codec on your input filter. 您需要做的是利用输入过滤上的多行编解码器。

As per the example: 根据示例:

input {
  file {
    path => "/var/log/someapp.log"
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^\[%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{TIME}"
      negate => true
      what => previous
    }
  }
}

This basically states that any line that doesnt start with the YYYYMMDD HH:mi:ss.000 will merge with the previous line 这基本上表明,任何不以YYYYMMDD HH:mi:ss.000开头的行都将与上一行合并

From there you can now apply Grok patterns to the first line (to get high level data). 现在,您可以从那里将Grok模式应用于第一行(以获取高级数据)。

Once you're happy you have all the data you require from the first line, you can then split on \\r or \\n and get individual stats data using a single grok pattern (based on the examples you gave above). 一旦感到满意,您就可以从第一行获得所需的所有数据,然后可以在\\ r或\\ n上拆分,并使用单个grok模式(基于上面给出的示例)获取单个统计数据。

Hope this helps 希望这可以帮助

D d

Update 2017-05-08 11:54: 更新2017-05-08 11:54:

Full logstash conf could possibly look like this, you will need to consider changing the grok patterns to better suit your requirements (only you know your data). 完整的logstash conf可能看起来像这样,您将需要考虑更改grok模式以更好地满足您的要求(仅您知道数据)。

Note, this hasn't been tested, I leave that up to you. 请注意,这尚未经过测试,我留给您。

input {
  file {
    path => "/var/log/someapp.log"
### For testing and continual process of the same file, remove these before produciton
    start_position => "beginning"
    sincedb_path => "/dev/null"
### Lets read the logfile and recombine multi line details
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^\[%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{TIME}"
      negate => true
      what => previous
    }
  }
}
filter {
### Let's get some high level data before we split the line (note: anything you grab before the split gets copied)
    grok {
        match => { "message" => "^\[%{YEAR:yr}%{MONTHNUM:mnt}%{MONTHDAY:daynum}\s*%{TIME:time}\s*%{TZ:timezone}\s*(%{DATA:thread_name})\s*%{JAVACLASS:javaclass}#%{WORD:method}\s*%{LOGLEVEL}\]"
        }
    }
### Split the lines back out to being a single line now. (this may be a \r or \n, test which one)
    split { 
        "field" => "message"
        "terminator" => "\r" 
    }
### Ok, the lines should now be independent, lets add another grok here to get the patterns as dictated by your example [fieldA: str | field2: 0...] etc.
### Note: you should look to change the grok pattern to better suit your requirements, I used DATA here to quickly capture your content
    grok {
        break_on_match => false
        match => { "message" => "^\[%{DATA}:\s*%{DATA:fieldA}\|%{DATA}:\s*%{DATA:field2}\|%{DATA}:\s*%{DATA:field3}\|%{DATA}:\s*%{DATA:field4}\|%{DATA}:\s*%{DATA:field5}\|%{DATA}:\s*%{DATA:field6}\|%{DATA}:\s*%{DATA:field7}\]$" }
    }
    mutate {
    convert => { "message" => "string" }
        add_field => {
            "session_timestamp" => "%{yr}-%{mnt}-%{daynum} %{time} %{timezone}"
            "load_timestamp" => "%{@timestamp}"
        }
        remove_field => ["yr","mnt", "daynum", "time", "timezone"]
    }
}
output {
  stdout { codec => rubydebug }
}

EDIT 2017-05-15 编辑2017-05-15

Logstash is a complex parser, it expects to stay up as a process and continuously monitor the log files (hence why you have to crash it out) Logstash是一个复杂的解析器,它希望作为一个进程保持不间断并持续监视日志文件(因此您必须将其崩溃)

Break on match would mean you could have multiple match requirements for the same line, if it didn't find a match it would try the next in the list (always go complex to simple) 比赛中止意味着您可能对同一行有多个比赛要求,如果找不到匹配项,则会尝试列表中的下一个(总是变得很简单)

Your input filter, change the path to end with .log*, also, as per your original example, does the pattern not have to be matched to the date format required (in order to bring all associations onto a single line) 输入过滤器,将路径更改为以.log *结尾,也按照原始示例,该模式不必与所需的日期格式匹配(以便将所有关联放在一行上)

Your filter should be specifying what the split character is I believe (otherwise I believe the default is a comma). 您的过滤器应指定我相信的分割字符(否则,我相信默认字符是逗号)。

input {
  file {
    path => "/u/bansalp/activemq_primary_plugin.stats.log*"
### For testing and continual process of the same file, remove these before production
    start_position => "beginning"
    sincedb_path => "/dev/null"
### Lets read the logfile and recombine multi line details
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^\[destName:"
      negate => false
      what => "previous"
    }
  }
}

filter {
    if "logPerDestinationStats" in [message] {
        grok {
                match => { "message" => "^\[%{YEAR:yr}%{MONTHNUM:mnt}%{MONTHDAY:daynum}\s*%{TIME:time}\s*%{TZ:timezone}\s*(%{DATA:thread_name})\s*%{JAVACLASS:javaclass}#%{WORD:method}\s*%{LOGLEVEL}\]\s*"
                }
        }
        split { 
            field => "message"
            terminator => "\r”
            }
        grok {
                match => { "message" => "^\[%{DATA}:\s*%{DATA:destName}\s*\|\s*%{DATA}:\s*%{NUMBER:enqueueCount}\s*\|\s*%{DATA}:\s*%{NUMBER:dequeueCount}\s*\|\s*%{DATA}:\s*%{NUMBER:dispatchCount}\s*\|\s*%{DATA}:\s*%{NUMBER:expiredCount}\s*\|\s*%{DATA}:\s*%{NUMBER:inflightCount}\s*\|\s*%{DATA}:\s*%{NUMBER:msgsHeld}\s*\|\s*%{DATA}:\s*%{NUMBER:msgsCached}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryPercentUsage}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryUsage}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryLimit}\s*\|\s*%{DATA}:\s*%{NUMBER:avgEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:maxEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:minEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:currentConsumers}\s*\|\s*%{DATA}:\s*%{NUMBER:currentProducers}\s*\|\s*%{DATA}:\s*%{NUMBER:blockedSendsCount}\s*\|\s*%{DATA}:\s*%{NUMBER:blockedSendsTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:minMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:maxMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:avgMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:totalMsgSize}\]$" }
        }
        mutate {
            convert => { "message" => "string" }
            add_field => {
                "session_timestamp" => "%{yr}-%{mnt}-%{daynum} %{time} %{timezone}"
                "load_timestamp" => "%{@timestamp}"
            }
            remove_field => ["yr","mnt", "daynum", "time", "timezone"]
        }
    }
   else {
      drop{}
    }
}

Please excuse the formatting I'm currently updating this from a mobile, I am happy for someone to update the formatting in my stead. 请原谅我当前正在通过移动设备更新的格式,我很高兴有人代替我来更新格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM