简体繁体中英

Importing a large amount of data into Elasticsearch every time by dropping existing data

原文 2021-01-25 07:49:25 3 1 elasticsearch

Currently, there's a denormalized table inside a MySQL database that contains hundreds of columns and millions of records.

The original source of the data does not have any way to track the changes so the entire table is dropped and rebuilt every day by a CRON job.

Now, I would like to import this data into Elaticsearch. What is the best way to approach this? Should I use logstash to connect directly to the table and import it or is there a better way? Exporting the data into JSON or similar is an expensive process since we're talking about gigabytes of data every time.

Also, should I drop the index in elastic as well or is there a way to make it recognize the changes?

1 answers

In any case - I'd recommend using index templates to simplify index creation.

Now for the ingestion strategy, I see two possible options:

Rework your ETL process to do a merge instead of dropping and recreating the entire table. This would definitely be slower but would allow shipping only deltas to ES or any other data source.
As you've imagined yourself - you should be probably fine with Logstash using daily jobs. Create a daily index and drop the old one during the daily migration.
You could introduce buffers, such as Kafka to your infrastructure, but I feel that might be an overkill for your current use case.

Index large amount of data into elasticsearch

Elasticsearch + Logstash: How to add a fields based on existing data at importing time

Importing and updating data in Elasticsearch

Time information in elasticsearch data

Problem of importing data to elasticsearch in laravel using Oracle

Importing data from file to ElasticSearch with logstash

Error during importing buld data into Elasticsearch

Handle large amount of shards in elasticsearch

How to quickly aggregate large amount of data

Importing of a large json file to elasticsearch

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Index large amount of data into elasticsearch Elasticsearch + Logstash: How to add a fields based on existing data at importing time Importing and updating data in Elasticsearch Time information in elasticsearch data Problem of importing data to elasticsearch in laravel using Oracle Importing data from file to ElasticSearch with logstash Error during importing buld data into Elasticsearch Handle large amount of shards in elasticsearch How to quickly aggregate large amount of data Importing of a large json file to elasticsearch

Related Tags

Importing a large amount of data into Elasticsearch every time by dropping existing data

Question

1 answers

solution1 0 2021-01-25 10:19:18

solution1
0 2021-01-25 10:19:18