简体   繁体   中英

Elasticsearch Reindexing while updating documents?

What if I've changed mapping for my index and wants to reindex?

I'm currenly using the Java API which does not yet have the reindex functionality, so using bulk would solve my problems. So the solution would look something like this

ref How to reindex in ElasticSearch via Java API

Long time ago

  • create index MY_INDEX_1
  • create mapping for MY_INDEX_1
  • create alias MY_INDEX_1 -> MY_INDEX
  • create documents in MY_INDEX

Time to reindex!

  • List item
  • create index MY_INDEX_2
  • create mapping for MY_INDEX_2
  • scroll search + bulk all documents from MY_INDEX_1 to MY_INDEX_2

Renaming and deletion of old index

  • create alias MY_INDEX_2 -> MY_INDEX
  • delete alias MY_INDEX_1 -> MY_INDEX
  • delete index MY_INDEX_1

But what happens, while reindexing all documents, a document that was reindexed in the beginning is updated from a user. Or that between reindexing and rename aliases the above happpens?

Possible Solutions ?

  • One way would be using external version, such as it does not overwrite an document with an higher version
  • Or could it be solved in another way?
  • Or between renaming aliases and deleting my_index_1, reindexing all documents that has been indexed since the reindexing? But then still it would be the case that a document has been updated between renaming aliases and second reindexing
  • Or should we lock while reindexing? Seems like a bad solution..

I think this is your real question:

But what happens, while reindexing all documents, a document that was reindexed in the beginning is updated from a user. Or that between reindexing and rename aliases the above happpens?

I just asked a question that is very close, but still has questions that need to be resolved separately. However, my research allows me to answer this question. See the question for details and references.

To answer your question, you create a second alias just before reindexing. I call this a duplicate_write_alias and you have your application, if it sees this second alias, write to first the old and then the new index via the two aliases. (the order is important to cancel a potential race). When the indexing is done, your indexing process deletes this duplicate_write_alias and moves your MY_INDEX alias to the new MY_INDEX_2 as noted above. Do the alias switch in one atomic command .

As I noted in my question, you still have to deal with potential 'index does not exist' errors because of a remaining race between your application's checking for existence of the alias and the alias being deleted. I'm hoping there's a better answer than 'always write twice and ignore errors' or 'check and hope for the best'...

I think there is also another (more ugly way): You can disable write operations for the source index while reindexing, this leads to temporary not usable apis, you don't have to:

  • Maintain a second storage to hold the truth
  • Deal with inconsistency
  • Flag documents for delete which should be deleted after migration
  • You can use elastic search engine storage to create snapshots between indecies
  • You can signal users of your api to send their change again later (when the indexing is done)

Downsides:

  • You have a downtime at least for write operations
  • You need more logic to handle errors, if the index would not be set to allow-writes-again mode (automatic recovery etc.)
  • Holding more than one index causes more storage space to be used.

For more information look here: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM