简体繁体 English

使用“多个管道”聚合 logstash 过滤器

[英]aggregate logstash filter with "multiple pipelines"

原文 2020-10-20 09:54:20 0 1 logstash/ aggregate-filter

I would like to let httpd access_log entries be processed by two different logstash filters.我想让 httpd access_log 条目由两个不同的 logstash 过滤器处理。

One of them is the "aggregate" filter, which is known to only work properly with a single worker thread.其中之一是“聚合”过滤器，已知它只能与单个工作线程一起正常工作。 However, the other filter (let's call it "otherfilter") should be allowed to work with several worker threads, so that there is no loss of performance.但是，应该允许另一个过滤器（我们称之为“otherfilter”）与多个工作线程一起工作，这样就不会损失性能。

To accomplish this I would like to use the "multiple pipeline" feature of logstash.为此，我想使用 logstash 的“多管道”功能。 Basically one pipeline should read the data ("input pipeline") and distribute it to two other pipelines on which the two mentioned filters operate (let's call them "aggregate pipeline" and "otherfilter pipeline").基本上，一个管道应该读取数据（“输入管道”）并将其分发到两个提到的过滤器在其上运行的其他两个管道（让我们称它们为“聚合管道”和“其他过滤器管道”）。

First tests have shown, that the results of the aggregate filter are not correct, if the input pipeline is set up to work with more than one thread.第一个测试表明，如果输入管道设置为使用多个线程，则聚合过滤器的结果不正确。 That is, when aggregating in the interval of 60 seconds an events counter sometimes shows more and sometimes less events as acutally occurred.也就是说，当在 60 秒的间隔内聚合时，事件计数器有时会显示更多，有时会显示实际发生的事件更少。 The problem seems that events arrive "not ordered" in the aggregate filter, and thus, intervals (whose start and end are determined based on timestamp field) are incorrect.问题似乎是事件在聚合过滤器中“无序”到达，因此间隔（其开始和结束基于时间戳字段确定）不正确。

So I ask myself whether what I want to achieve is at all feasible with "multiple pipelines"?所以我问自己，我想要实现的“多管道”是否完全可行？

1 个解决方案

You can breakup a single pipeline in multiple pipelines, but since you want to use the aggregate filter you need to make sure that everything that happens before the event enters the aggregate filter is running with only one worker.您可以在多个管道中分解单个管道，但由于您想使用aggregate过滤器，您需要确保在事件进入aggregate过滤器之前发生的所有事情都只在一个工作人员的情况下运行。

For example, if you broke up your pipeline into pipeline A, which is your input, pipeline B, which is your aggregate filter, and pipeline C, which is your other filter.例如，如果您将管道分解为管道 A（即您的输入）、管道 B（即您的聚合过滤器）和管道 C（即您的另一个过滤器）。

This will only work if:这仅在以下情况下有效：

Pipeline A is running with only one worker.管道 A 只运行一个工人。
Pipeline B is running with only one worker.管道 B 只运行一个工人。
Pipeline C runs after pipeline B and don't rely on the orders of the events.管道 C 在管道 B 之后运行，不依赖于事件的顺序。

If your input pipeline is running with more than one worker you can't guarantee the order of the events when they enter your aggregate pipeline, so basically your input and your aggregate should be in the same pipeline and then you can direct the output to the other filter pipeline that runs with more than one worker.如果您的输入管道与多个工作人员一起运行，您无法保证事件进入聚合管道时的顺序，因此基本上您的输入和聚合应该在同一管道中，然后您可以将输出定向到与多个工作人员一起运行的其他过滤器管道。