简体   繁体   English

Logstash:从可选行读取多行数据

[英]Logstash: Reading multiline data from optional lines

I have a log file which contains lines which begin with a timestamp. 我有一个日志文件,其中包含以时间戳开头的行。 An uncertain number of extra lines might follow each such timestamped line: 每条带时间戳的行之后可能会出现数量不确定的额外行:

SOMETIMESTAMP some data
extra line 1 2
extra line 3 4

The extra lines would provide supplementary information for the timestamped line. 额外的行将为带时间戳的行提供补充信息。 I want to extract the 1, 2, 3, and 4 and save them as variables. 我想提取1、2、3和4并将其另存为变量。 I can parse the extra lines into variables if I know how many of them there are. 如果我知道其中有多少行,我可以将多余的行解析为变量。 For example, if I know there are two extra lines, the grok filter below will work. 例如,如果我知道有两条额外的线,则下面的grok过滤器将起作用。 But what should I do if I don't know, in advance, how many extra lines will exist? 但是,如果我事先不知道会存在多少行,该怎么办? Is there some way to parse these lines one-by-one, before applying the multiline filter? 在应用多行过滤器之前,是否有某种方法可以逐行解析这些行? That might help. 这可能会有所帮助。

Also, even if I know I will only have 2 extra lines, is the filter below the best way to access them? 另外,即使我知道我只会有2条额外的行,但过滤器是否是访问它们的最佳方法?

filter {
    multiline {
        pattern => "^%{SOMETIMESTAMP}"
        negate => "true"
        what => "previous"
    }

    if "multiline" in [tags] {
        grok {
            match => { "message" => "(?m)^%{SOMETIMESTAMP} %{DATA:firstline}(?<newline>[\r\n]+)%{DATA:secondline}(?<newline>[\r\n]+)%{DATA:thirdline}$" }
        }
    }
    # After this would be grok filters to process the contents of
    # 'firstline', 'secondline', and 'thirdline'. I would then remove
    # these three temporary fields from the final output.
}

(I separated the lines into separate variables since this allows me to do additional pattern matching on the contents of the lines separately, without having to refer to the entire pattern all over again. For example, based on the contents of the first line, I might want to present branching behavior for the other lines.) (我将这些行划分为单独的变量,因为这使我可以分别对行的内容进行其他模式匹配,而不必再次遍历整个模式。例如,基于第一行的内容,我可能想展示其他行的分支行为。)

Why do you need this? 你为什么需要这个?

Are you going to be inserting one single event with all of the values or are they really separate events that just need to share the same time stamp? 您是要插入一个具有所有值的单个事件,还是它们真的是单独的事件,只需要共享同一时间戳?

If they all need to appear in the same event, you'll like need to resort to a ruby filter to separate out the extra lines into fields on the event that you can then further work on. 如果它们都需要出现在同一事件中,则您需要使用ruby过滤器将事件中的多余行分隔为多个字段,然后可以进一步进行处理。

For example: 例如:

if "multiline" in [tags] {
    grok {
        match => { "message" => "(?m)^%{SOMETIMESTAMP} %{DATA:firstline}(?<newline>[\r\n]+)" }
    }
    ruby {
       code => '
         event["lines"] = event["message"].scan(/[^\r\n]+[\r\n]*/);
       '
    }
}

If they are really separate events, you could use the memorize plugin for logstash 1.5 and later. 如果它们确实是单独的事件,则可以使用logstash 1.5及更高版本的memorize插件。

This has changed over versions of ELK Direct event field references (ie event['field']) have been disabled in favor of using event get and set methods (eg event.get('field')). 相对于ELK Direct事件字段引用(即event ['field'])的版本,此更改已被禁用,有利于使用事件get和set方法(例如event.get('field'))。

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:logtime} %{LOGLEVEL:level}%{DATA:firstline}" }
    }
    ruby { code => "event.set('message', event.get('message').scan(/[^\r\n]+[\r\n]*/))" }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM