简体   繁体   中英

Extract from ElasticSearch, into Kafka, continuously any new ES updates using logstash

I have an ES cluster with multiple indices that all receive updates in random time intervals. I have a logstash instance extracting data from ES and passing it into Kafka.

What would be a good method to run this every minute and pickup any updates in ES?

Conf:

 input {
   elasticsearch {
     hosts => [ "hostname1.com:5432", "hostname2.com" ]
     index => "myindex-*"
     query => "*"
     size => 10000
     scroll => "5m"
   }
 }
 output {
   kafka {
     bootstrap-servers => "abc-kafka.com:1234"
     topic_id => "my.topic.test"
   }
 }

I would like to use the documents @timestamp in a query and save it in a temp file, then rerun query (with a schedule) and get the latest updates/insert (something like what the jdbc-input plugin of logstash supports)

Any ideas?

Thank you in advance

Someone asked the same thing a few months ago but that issue didn't get much traffic. You can +1 it, maybe.

In the meantime, you could modify the query in your elasticsearch input to be like this:

query => '{"query":{"range":{"timestamp":{"gt": "now-1m"}}}}'

ie you query all documents whose timestamp field (arbitrary name, change to match yours) is within the past minute

Then you need to setup a cron that will start your logstash process every minute. Now due to the latency between the moment the cron is triggered, the moment logstash starts running and the moment the query arrives on the ES server side, just know that 1m might not be sufficient and you risk missing some docs. You need to test this and find out which is best.

According to this recent blog post , another way could be to record the last time Logstash ran in an environment variables LAST_RUN and use that variable in the query:

query => '{"query":{"range":{"timestamp":{"gt": "${LAST_RUN}"}}}}'

In this scenario, you'd create a shell script that is run by a cron and that does basically this:

  1. run logstash -f your_config_file.conf
  2. when done, set LAST_RUN=$(date +"%FT%T")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM