简体   繁体   中英

Importing a large amount of data into Elasticsearch every time by dropping existing data

Currently, there's a denormalized table inside a MySQL database that contains hundreds of columns and millions of records.

The original source of the data does not have any way to track the changes so the entire table is dropped and rebuilt every day by a CRON job.

Now, I would like to import this data into Elaticsearch. What is the best way to approach this? Should I use logstash to connect directly to the table and import it or is there a better way? Exporting the data into JSON or similar is an expensive process since we're talking about gigabytes of data every time.

Also, should I drop the index in elastic as well or is there a way to make it recognize the changes?

In any case - I'd recommend using index templates to simplify index creation.

Now for the ingestion strategy, I see two possible options:

  • Rework your ETL process to do a merge instead of dropping and recreating the entire table. This would definitely be slower but would allow shipping only deltas to ES or any other data source.
  • As you've imagined yourself - you should be probably fine with Logstash using daily jobs. Create a daily index and drop the old one during the daily migration.
  • You could introduce buffers, such as Kafka to your infrastructure, but I feel that might be an overkill for your current use case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM