简体   繁体   English

ElasticSearch 更新不是即时的,你如何等待 ElasticSearch 完成更新它的索引?

[英]ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?

I'm attempting to improve performance on a suite that tests against ElasticSearch.我正在尝试提高针对 ElasticSearch 进行测试的套件的性能。

The tests take a long time because Elasticsearch does not update it's indexes immediately after updating.测试需要很长时间,因为 Elasticsearch 在更新后不会立即更新它的索引。 For instance, the following code runs without raising an assertion error.例如,以下代码运行时不会引发断言错误。

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

results = elasticsearch.search()
assert not results
# results are not populated

Currently out hacked together solution to this issue is dropping a time.sleep call into the code, to give ElasticSearch some time to update it's indexes.目前针对此问题的共同解决方案是将time.sleep调用删除到代码中,以便给 ElasticSearch 一些时间来更新其索引。

from time import sleep
from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Don't want to use sleep functions
sleep(1)

results = elasticsearch.search()
assert len(results) == 1
# results are now populated

Obviously this isn't great, as it's rather failure prone, hypothetically if ElasticSearch takes longer than a second to update it's indexes, despite how unlikely that is, the test will fail.显然这不是很好,因为它很容易失败,假设如果 ElasticSearch 更新它的索引需要超过一秒的时间,尽管这不太可能,但测试将失败。 Also it's extremely slow when you're running 100s of tests like this.当您运行 100 次这样的测试时,它也非常慢。

My attempt to solve the issue has been to query the pending cluster jobs to see if there are any tasks left to be done.我试图解决这个问题是查询挂起的集群作业,看看是否还有任何任务需要完成。 However this doesn't work, and this code will run without an assertion error.但是,这不起作用,并且此代码将在没有断言错误的情况下运行。

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Query if there are any pending tasks
while elasticsearch.cluster.pending_tasks()['tasks']:
    pass

results = elasticsearch.search()
assert not results
# results are not populated

So basically, back to my original question, ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?所以基本上,回到我原来的问题,ElasticSearch 更新不是立即的,你如何等待 ElasticSearch 完成更新它的索引?

As of version 5.0.0, elasticsearch has an option:从 5.0.0 版本开始,elasticsearch 有一个选项:

 ?refresh=wait_for

on the Index, Update, Delete, and Bulk api's.在索引、更新、删除和批量 api 上。 This way, the request won't receive a response until the result is visible in ElasticSearch.这样,直到结果在 ElasticSearch 中可见,请求才会收到响应。 (Yay!) (好极了!)

See https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-refresh.html for more information.有关更多信息,请参阅https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-refresh.html

edit: It seems that this functionality is already part of the latest Python elasticsearch api: https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.index编辑:这个功能似乎已经是最新的 Python elasticsearch api 的一部分: https : //elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.index

Change your elasticsearch.update to:将您的 elasticsearch.update 更改为:

elasticsearch.update(
     index='blog',
     doc_type='blog'
     id=1,
     refresh='wait_for',
     body={
        ....
    }
)

and you shouldn't need any sleep or polling.你不应该需要任何睡眠或投票。

Seems to work for me:似乎对我有用:

els.indices.refresh(index)
els.cluster.health(wait_for_no_relocating_shards=True,wait_for_active_shards='all')

Elasticsearch do near real-time search . Elasticsearch 做近乎实时的搜索 The updated/indexed document is not immediately searchable but only after the next refresh operation.更新/索引文档不能立即搜索,但只能在下一次刷新操作之后搜索。 The refresh is scheduled every 1 second.每 1 秒刷新一次。

To retrieve a document after updating/indexing, you should use GET api instead.要在更新/索引后检索文档,您应该改用 GET api。 By default, the get API is realtime, and is not affected by the refresh rate of the index . 默认情况下,get API 是实时的,不受索引刷新率的影响 That means if the update/index was correctly done, you should see the modifications in the response of GET request.这意味着如果更新/索引正确完成,您应该在 GET 请求的响应中看到修改。

If you insist on using SEARCH api to retrive a document after updating/indexing.如果您坚持在更新/索引后使用 SEARCH api 检索文档。 Then from the documentation, there are 3 solutions : 然后从文档中,有 3 个解决方案

  • Waiting for the refresh interval等待刷新间隔
  • Setting the ?refresh option in an index/update/delete request在索引/更新/删除请求中设置?refresh 选项
  • Using the Refresh API to explicitly complete a refresh (POST _refresh) after an index/update request.在索引/更新请求后使用Refresh API显式完成刷新 (POST _refresh)。 However, please note that refreshes are resource-intensive.但是,请注意刷新是资源密集型的。

If you use bulk helpers you can do it like this:如果您使用批量助手,您可以这样做:

from elasticsearch.helpers import bulk    
bulk(client=self.es, actions=data, refresh='wait_for')

如果不想等待集群刷新间隔,也可以调用 elasticsearch.Refresh('blog')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM