简体   繁体   English

fluentd:使用弹性搜索FluentD和Kibana从多个资源进行日志聚合

[英]fluentd : log aggregation from multiple resources using Elastic search FluentD and Kibana

I am having logs coming from various sources and the format of the logs is 我有来自各种来源的日志,日志的格式是

[2018-11-20 11:27:41,187] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:355} INFO - Inside poll job status

[2018-11-20 11:27:41,187] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:357} DEBUG - Poll time out has been set to: 6 hr(s)

[2018-11-20 11:27:41,188] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:369} DEBUG - Batch_id of the running job is = 123456

[2018-11-20 11:27:41,188] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:377} DEBUG - Getting cluster ID for the cluster: 

I want to push these logs to the elastic search having an index as batch_id, how can I achieve this? 我想将这些日志推送到索​​引为batch_id的弹性搜索中,如何实现呢? The issue is that I am having batch_id in some of the lines, not in all the lines. 问题是我在某些行中而不是在所有行中都有batch_id。 I have written the custom parser to convert the logs into JSON 我已经编写了自定义解析器,将日志转换为JSON

td-agent.conf is td-agent.conf是

<source>
  @type tail
  path /tmp/logs/airflow.logs
  pos_file /tmp/logs/airflow1.pos
  format /^\[(?<logtime>[^\]]*)\] \{(?<parent_script>[^ ]*)\} (?<parent_script_log_level>[^ ]*) - (?<subtask_name>[^ ]*): \[(?<subtask_log_time>[^\]]*)\] \{(?<script_name>[^ ]*)\} (?<script_log_info>[^ ]*) - (?<message>[^*]*)/
  time_key logtime
  tag airflow_123
  read_from_head true
  include_tag_key true
  tag_key event_tag
  @log_level debug
</source>

<match airflow_123>
  @type copy
  <store>
    @type stdout
  </store>
  <store>
  @type elasticsearch
  host es_host
  port es_port
  index_name fluentd.${tag}.%Y%m%d
  <buffer tag, time>
    timekey 1h # chunks per hours ("3600" also available)
  </buffer>
  type_name log
  with_transporter_log true
  @log_level debug
  </store>
</match>

Also, what would be the best practice for log aggregation using EFK stack? 另外,使用EFK堆栈进行日志聚合的最佳实践是什么?

If you want to stick to the components of the Elastic stack, the logs can be read, parsed and persisted as below: 如果要坚持使用弹性堆栈的组件,可以按以下方式读取,解析和保存日志:

  1. Filbeat: Reads the events (every logical line of the logs) and pushes it to the Logstash Filbeat:读取事件(日志的每个逻辑行)并将其推送到Logstash
  2. Logstash: Parse the logs to breakup the strings into meaningful fields as per your requirement. Logstash:根据需要解析日志以将字符串分成有意义的字段。 Strings can be parsed using GROK filters. 可以使用GROK过滤器解析字符串。 This is preferred than building custom parsers. 这比构建自定义解析器更好。 The parsed information is sent to Elasticsearch for getting persisted and indexed preferably based on timestamp. 解析后的信息被发送到Elasticsearch以获得持久化和索引,最好基于时间戳。
  3. Kibana: Visualize the parsed information using single search or aggregation. Kibana:使用单个搜索或聚合可视化已解析的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM