简体   繁体   中英

Reindexing documents in elasticsearch with proper updates

I need to re-index all my documents to a new index with updated mappings and a different index settings such as number of shards.

The events are published in a Kafka topic and then consumed by a service which push that event to elastic search. So, I don't want to stop consuming the events while re-indexing.

To achieve this, I have kept primaryIndex (name of the old index) and secondaryIndex (name of the new index) in application.properties of a spring app. So while indexing document, application will write the events to both indices (primary and secondary) and read from primary index only. Now I will run _reindex API to move documents from old index to a new index. As re-indexing will last for about 4-5 days, an event may get overridden by the _reindex API which I want to avoid.

How can I ensure my documents are not being overridden by _reindex API?

Once re-indexing is done, I can remove secondary index from my application properties and will replace primaryIndex with new index name and then reading part can also be done from the new index.

Or is there any better approach to achieve the same?

You can instruct _reindex API to move documents to new index only when it is not present in the new index. If a document is already present in new index, that can either be a new event or an update event which you don't want to get overridden.

You can give op_type: 'create' in the reindex API. For more info, please follow the link https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Hope this answers your question:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM