简体   繁体   中英

What happens if you run ElasticSearch index while index is already running?

I have a hourly job that indexes data from a database into ElasticSearch but it seems like the indexing is taking more than an hour.

What happens if there is a second indexing while the other one is still running? Are there any problems that might occur?

I think this question is little bit hazy...

If in your job during indexing data you are not specify _id - you will spawn duplicates - it is terrible situation.
But if you specify _id you will just re-index same documents few times - it is not so awful, but it is additional and needless work for your server.
But if your job consume lot of resources (cpu, memory, etc) you might overload your server...

Nothing should happen. Elasticsearch can handle this easy. My advice would be to look from the other side to this problem. Maybe it would be better to try improve sync instead. eg register inserts to queue and then scale it to multiple workers. btw do you use bulk for insert?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM