简体   繁体   English

使用logstash解析日志时如何添加序列ID

[英]How to add sequence id when using logstash to parse log

I want to index hadoop logs with logstash and elasticsearch. 我想用logstash和elasticsearch索引hadoop日志。 Here is my problem: I load logs into elasticsearch by logstash and I hope to search events by elasticsearch and keep the order of events as same as it in original log files. 这是我的问题:我通过logstash将日志加载到elasticsearch中,我希望通过elasticsearch搜索事件,并保持事件的顺序与原始日志文件中的相同。 But it doesn't work. 但它不起作用。 For example, the events in original log file maybe looks like: 例如,原始日志文件中的事件可能如下所示:

2013-12-25 23:10:19,022 INFO A..
2013-12-25 23:10:19,022 INFO B..
2013-12-25 23:10:19,022 INFO C..

But when I search it using elasticsearch by the keyword "@timestamp",the result maybe like this: 但是当我使用关键字“@timestamp”的elasticsearch搜索它时,结果可能是这样的:

2013-12-25 23:10:19,022 INFO B..
2013-12-25 23:10:19,022 INFO A..
2013-12-25 23:10:19,022 INFO C..

Because timestampa are same in this three events, the search result can not keep the order as before. 由于这三个事件中的时间戳相同,因此搜索结果无法像以前一样保持顺序。

Here is my solution: I think I can add a id for each event, the id is added when logstash parsing the data and it increases with the timestamp. 这是我的解决方案:我想我可以为每个事件添加一个id,当logstash解析数据时会添加id,并且它会随着时间戳而增加。 Then when I search events, I can use ids instead of timestamps and they will keep right order even when their timastamp are same. 然后,当我搜索事件时,我可以使用id而不是时间戳,即使他们的timastamp相同,他们也会保持正确的顺序。

But I don't know how to add the extra autoincremental 'id' field using logstash, I considered the conf file of logstash and didn't find the solution. 但我不知道如何使用logstash添加额外的自动增量'id'字段,我认为logstash的conf文件并没有找到解决方案。 Please give me some advices of how I can implement this, thanks a lot! 请给我一些如何实现这一点的建议,非常感谢!

You can try to use timestamp to insert a new field seq . 您可以尝试使用时间戳来插入新的字段seq Here is the configuration, 这是配置,

ruby {
    code => "
          event['seq'] = Time.now.strftime('%Y%m%d%H%M%S%L').to_i                
    "
}

With this solution you no need to write any plugin. 使用此解决方案,您无需编写任何插件。 In this example we use the timestamp millisecond as the value of field seq . 在此示例中,我们使用timestamp millisecond作为字段seq的值。 However, if your CPU is powerful and your logs is process faster, maybe there will have 2 events have the same value. 但是,如果您的CPU功能强大且日志处理速度更快,则可能会有2个事件具有相同的值。 Please have a try on it. 请试一试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM