简体   繁体   English

使用Elasticsearch Logstash索引日志(使用预处理Python脚本)

[英]Indexing logs with Elasticsearch Logstash (using preprocessing Python script)

I have an issue with Elasticsearch Logstash. 我对Elasticsearch Logstash有问题。 My objective is to send automatically logs into elasticsearch with logstash. 我的目标是使用logstash自动将日志发送到elasticsearch。

My raw logs looks like that : 我的原始日志如下所示:

2016-09-01T10:58:41+02:00 INFO (6):     165.225.76.76   entreprise1 email1@gmail.com    POST    /application/controller/action Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko    {"getid":"1"}   86rkt2dqsdze5if1bqldfl1
2016-09-01T10:58:41+02:00 INFO (6):     165.225.76.76   entreprise2 email2@gmail.com    POST    /application/controller2/action2    Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   {"getid":"2"}   86rkt2rgdgdfgdfgeqldfl1
2016-09-01T10:58:41+02:00 INFO (6):     165.225.76.76   entreprise3 email3@gmail.com    POST    /application/controller2/action2    Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko   {"getid":"2"}

The problem is that I don't want to insert my logs in this form. 问题是我不想以这种形式插入日志。 I want to use a preprocessing script in python, in order to transform my data before injecting into Elastic with logstash. 我想在python中使用预处理脚本,以便在使用logstash注入Elastic之前转换数据。
At start, I wanted to do logging into elasticsearch only using python script. 开始时,我只想使用python脚本登录elasticsearch。 But I have huge quantity of logs separated in a lot of folders and files, constantly updated so I think it is way more powerfull to use logstash or filebeat. 但是我有大量的日志分散在许多文件夹和文件中,并不断更新,因此我认为使用logstash或filebeat的功能更强大。 I was trying with filebeat and gork filter (not enough for my case) but I think its impossible to use a preprocess script before logging. 我正在尝试使用filebeat和gork过滤器(不足以满足我的情况),但是我认为在记录之前无法使用预处理脚本。

Logs should look like that at the end of the python script: 日志应该在python脚本的末尾看起来像这样:

{"page": "/application/controller/action", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action", "client": "entreprise1", "email": "email1@gmail.com", "feature": "application_controller_action", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller", "application": "application"} 
{"page": "/application/controller2/action2", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action2", "client": "entreprise2", "email": "email2@gmail.com", "feature": "application_controller2_action2", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller2", "application": "application"} 
{"page": "/application3/controller/action3", "ip": "165.225.76.76", "browser": "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", "action": "action3", "client": "entreprise3", "email": "email3@gmail.com", "feature": "application_controller3_action3", "time": "2016-09-01 10:58:41", "method": "POST", "controller": "controller3", "application": "application"}

I'm struggling with the implementation of python script in logstash filter. 我在logstash过滤器中努力实现python脚本。 I know it's something that can be implemented, but basically it's done with ruby script (cf : https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html ) 我知道这是可以实现的,但是基本上它是用ruby脚本完成的(cf: https : //www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html

1) Do you think it is possible to solve my problem using logstash ? 1)您认为可以使用logstash解决我的问题吗?

2) If yes, my python script should take a raw log line as input and the json formatted line as output ? 2)如果是,我的python脚本应将原始日志行作为输入,并将json格式化的行作为输出?

3) When a line of log is added into a log file, the entire file is reinsert each time, how can I handle this ? 3)当在日志文件中添加一行日志时,每次都会重新插入整个文件,我该如何处理?

4) Do you think it's possible to do it with filebeat ? 4)您认为可以使用filebeat做到吗? And according to you, what is be the best for my case ? 根据您的说法,最适合我的情况是什么?

For now, my configuration logstash file looks like this: 现在,我的配置logstash文件如下所示:

input {
  file {
    path => "/logpath/logs/*/*.txt"
    start_position => "beginning"
  }
}

filter {
  # Here is where I should use my script to transform my logs into my json needed format
  date {
    match => ["time", "YYYY-MM-dd HH:mm:ss" ]
  }

  geoip {
    source => "ip"
    target => "geoip"
  }


}

output {
  stdout  {
    codec => dots {}
  }

  elasticsearch {
    index => "logs_index"
    document_type => "logs"
    template => "./logs_template.json"
    template_name => "logs_test"
    template_overwrite => true
  }

}

I really want to thanks in advance any people that will help me out and consider my request. 我真的要在此先感谢任何可以帮助我并考虑我的要求的人。

Dimitri 迪米特里

PS: Sorry for the syntax, english is not my main language. PS:抱歉,语法不是我的主要语言。

The standard way to convert logs to json format is using grok,json filters in logstash configuration. 将日志转换为json格式的标准方法是在logstash配置中使用grok,json过滤器。 and inorder to reduce load on Logstash to process logs filebeat can be used along with your configuration. 为了减少Logstash上的负载以处理日志文件拍子,可以将其与配置一起使用。

Hence, Best cofiguration which can solve this problem is filebeat->logstash->Elasticsearch stack. 因此,可以解决此问题的最佳配置是filebeat-> logstash-> Elasticsearch堆栈。

You do not need the python script, instead use filebeat to catch all the logs from specific place and forward it to logstash. 您不需要python脚本,而是使用filebeat捕获特定位置的所有日志并将其转发到logstash。

Install filebeat on server where all the logs are getting accumulated, it will be good if you direct all logs in a specific folder. 在累积了所有日志的服务器上安装filebeat,如果将所有日志定向到特定的文件夹中会很好。 Install filebeat first and then setup configuration to forward logs to logstash 首先安装filebeat,然后设置配置以将日志转发到logstash

Here is the filebeat configuration: 这是filebeat配置:

filebeat:
  prospectors:
    -
      paths:
        - "*log_path_of_all_your_log_files*"
      input_type: log
      json.message_key: statement
      json.keys_under_root: true

  idle_timeout: 1s
  registry_file: /var/lib/filebeat/registry
output:

  logstash:
    hosts: ["*logstash-host-ip:5044*"]
    worker: 4
    bulk_max_size: 1024
shipper:
logging:
  files:
    rotateeverybytes: 10485760 # = 10MB
    level: debug

Now here, along with your logstash configuration, you need to have GROK filter to convert your logs to json format (make changes in logstash configuration file) and then forward it to elasticsearch kibana or wherever you want. 现在,在这里,连同您的logstash配置一起,您需要具有GROK过滤器才能将您的日志转换为json格式(在logstash配置文件中进行更改),然后将其转发到elasticsearch kibana或任何您想要的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM