简体   繁体   English

Logstash - 将数据保存在内存中的输入文件插件

[英]Logstash - input file plugin to keep data in memory

I have 1- a single CSV file and 2- a live KAFKA stream.我有 1- 单个 CSV 文件和 2- 实时 KAFKA 流。 KAFKA stream brings in live streaming logs and the CSV file contains metadata records that I need to join them with the streaming logs before sending them to Elastic Search. KAFKA 流引入实时流日志,CSV 文件包含元数据记录,在将它们发送到 Elastic Search 之前,我需要将这些记录与流日志连接起来。

Example of a Kafka stream log and a CSV record: Kafka 流日志和 CSV 记录的示例:

KAFKA log: MachineID: 2424, MachineType: 1, MessageType: 9
CSV record: MachineID: 2424, MachineOwner: JohnDuo

Record I need to build in logstash before sending to ES:记录我需要在发送到 ES 之前在 logstash 中构建:

MachineID: 2424
MachineOwner: JohnDuo
MachineType: 1
MessageType: 9

I want a solution either a Ruby or Logstash plugin or anything else to read this CSV file once and bring them in and join them in the Logstash conf file.我想要一个 Ruby 或 Logstash 插件或其他任何解决方案来读取此 CSV 文件一次并将它们引入并将它们加入 Logstash conf 文件中。 I need to keep the content of the CSV file in memory otherwise CSV look ups on each live Kafka log kills my Logstash performance.我需要将 CSV 文件的内容保存在内存中,否则对每个实时 Kafka 日志的 CSV 查找会影响我的 Logstash 性能。

Try the translate filter.试试translate过滤器。

You would need something like this.你会需要这样的东西。

filter {
    translate {
        dictionary_path => "/path/to/your/csv/file.csv"
        field => "[MachineId]"
        destination => "[MachineOwner]"
        fallback => "not found"
    }
}

Then you in your file.csv you will have the following.然后你在你的file.csv你将有以下内容。

2424,JohnDuo
2425,AnotherUser

For every event that has the field MachineId , this filter will look up for this id in the dictionary, if it finds a match, it will create a field named MachineOwner with the value of the match, if it does not find a match, it will create the field MachineOwner with the value not found , if you do not want to create the field in the case of a no match, you can remove the fallback option.对于每个具有MachineId字段的事件,此过滤器将在字典中查找此 id,如果找到匹配项,它将使用匹配MachineOwner的值创建一个名为MachineOwner的字段,如果没有找到匹配项,则将使用值not found创建字段MachineOwner ,如果您不想在不匹配的情况下创建字段,则可以删除fallback选项。

The dictionary is loaded in memory when logstash starts and it is reloaded every 300 seconds, you also can change that behaviour.字典在logstash 启动时加载到内存中,并且每300 秒重新加载一次,您也可以更改该行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM