Logstash [聚合过滤器] 在事件之间传递数据

Question

i am currently working on a project with the Elastic stack for a log monitoring system.我目前正在使用弹性堆栈进行日志监控系统的项目。 The logs i have to load are in a specific format so i have to write my own logstash scripts to read them.我必须加载的日志是特定格式的，所以我必须编写自己的 logstash 脚本来读取它们。 In particular one type of logs where i have a date in the start of the file and the timestamp in each of the other lines has no date, my goal is to extract the date from the first line and add it to all the next ones, after some research i found that the aggregate filter can help but i can't get it to work, here is my config file:特别是一种类型的日志，其中我在文件的开头有一个日期，而其他每一行中的时间戳没有日期，我的目标是从第一行中提取日期并将其添加到所有下一行，经过一些研究，我发现聚合过滤器可以提供帮助，但我无法让它工作，这是我的配置文件：

input
{
    file {
        path => "F:/ELK/data/testFile.txt"
        #path => "F:/ELK/data/*/request/*"
        start_position => "beginning"
        sincedb_path => "NUL"
    }
}
filter
{
    mutate {
        add_field => { "taskId" => "all" }
    }

        grok
        {
            match => {"message" => "-- %{NOTSPACE} %{NOTSPACE}: %{DAY}, %{MONTH:month} %{MONTHDAY:day}, %{YEAR:year}%{GREEDYDATA}"}
            tag_on_failure => ["not_date_line"]
        }

        
    
    if "not_date_line" not in [tags]
    {
        mutate{
            replace => {'taskId' => "%{day}/%{month}/%{year}"}
            remove_field => ["day","month","year"]
        }

        aggregate
        {
            task_id => "%{taskId}"
            code => "map['taskId'] = event.get('taskId')"
            map_action => "create"
        }
    }
    else
    {
        dissect
        {
            mapping => { message => "%{sequence_index}  %{time} %{pid}  %{puid} %{stack_level}  %{operation}    %{params}   %{op_type}  %{form_event}   %{op_duration}"}
        }

        aggregate {
            task_id => "%{taskId}"
            code => "event.set('taskId', map['taskId'])"
            map_action => "update"
            timeout => 0
        }
        mutate
        {
            strip => ["op_duration"]
            replace => {"time" => "%{taskId}-%{time}"}
        }
    }
    
    mutate
    {
        remove_field => ['@timestamp','host','@version','path','message','tags']
    }
}
output 
{
    stdout{}
}

the scripts reads the date correctly but then doesn't work to replace the value in the other events:脚本正确读取日期，但无法替换其他事件中的值：


{
    "taskId" => "22/October/2020"
}
{
               "pid" => "45",
    "sequence_index" => "10853799",
           "op_type" => "1",
              "time" => "all-16:23:29:629",
            "params" => "90",
       "stack_level" => "0",
       "op_duration" => "",
         "operation" => "10",
        "form_event" => "0",
            "taskId" => "all",
              "puid" => "1724"
}

I am using only one worker to ensure the order of the events is kept intact, if you know of any other way to achieve this i'm open to suggestions, thank you !我只使用一名工人来确保事件的顺序保持不变，如果您知道任何其他方式来实现这一点，我愿意接受建议，谢谢！

Answer 1

For the lines which have a date you are setting the taskId to "%{day}/%{month}/%{year}", for the rest of the lines you are setting it to "all".对于具有日期的行，您将 taskId 设置为“%{day}/%{month}/%{year}”，对于 rest 的行，您将其设置为“all”。 The aggregate filter will not aggregate across events with different task ids.聚合过滤器不会聚合具有不同任务 ID 的事件。

I suggest you use a constant taskId and store the date in some other field, then in a single aggregate filter you can use something like我建议您使用常量 taskId 并将日期存储在其他字段中，然后在单个聚合过滤器中您可以使用类似

code => '
    date = event.get("date")
    if date
        @date = date
    else
        event.set("date", @date)
    end
'

@date is an instance variable, so its scope is limited to that aggregate filter, but it is preserved across events. @date 是一个实例变量，因此它的 scope 仅限于该聚合过滤器，但它会跨事件保留。 It is not shared with other aggregate filters (that would require a class variable or a global variable).它不与其他聚合过滤器共享（这需要 class 变量或全局变量）。

Note that you require event order to be preserved, so you should set pipeline.workers to 1.请注意，您需要保留事件顺序，因此您应该将 pipeline.workers 设置为 1。

Answer 2

Thanks to @Badger and some other post he answered on the elastic forum, i found a solution using a single ruby filter and an instance variable, couldn't get it to work with the aggregate filter but that is not an issue for me.感谢@Badger 和他在弹性论坛上回答的其他一些帖子，我找到了一个使用单个 ruby 过滤器和实例变量的解决方案，无法让它与聚合过滤器一起使用，但这对我来说不是问题。

ruby
{
    init => '@date = ""'
    code => "
        event.set('date',@date) unless @date.empty?
        @date = event.get('date') unless event.get('date').empty?
    "
}

Logstash [聚合过滤器] 在事件之间传递数据

问题描述

2 个解决方案

解决方案1
2 2020-11-30 14:53:12

解决方案2
1 已采纳 2020-12-01 10:35:40

Logstash [聚合过滤器] 在事件之间传递数据

问题描述

2 个解决方案

解决方案1 2 2020-11-30 14:53:12

解决方案2 1 已采纳 2020-12-01 10:35:40

解决方案1
2 2020-11-30 14:53:12

解决方案2
1 已采纳 2020-12-01 10:35:40