简体   繁体   中英

Elasticsearch: Sorted scroll in python inconsistent

I am a little confused with the results. I have a simple query to get the latest document added (based on sorted created date or timestamp):

query = {
            "query": {"match_all": {}},
            "sort": [
                {"created_date":  "desc"}
            ],
            "size": 1
        }

When I use helpers.scan() abstraction over Scroll() API. I get a hit which is different each time (inconsistent). My Elastic cluster is static (no new data points are being added) and the inconsistency in response is strange as I have sorted all entries and asked to return the the first hit (size 1) in my query. What am I missing here ?

For future references to people who stumble upon this. The documentation on the ElasticSearch homepage may not clarify doubts here but the python driver has a very good documentation. As per helpers.scan() :

By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan

So, for use cases like this, it is better to use search() than scan()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM