簡體   English   中英

滾動 python Elasticsearch 不起作用

[英]Scroll in python Elasticsearch not working

當我查詢 Elasticsearch 時,我嘗試使用 python 滾動所有文檔,以便獲得超過 10K 的結果:

from elasticsearch import Elasticsearch
es = Elasticsearch(ADDRESS, port=PORT)


result = es.search(
    index="INDEX",
    body=es_query,
    size=10000,
    scroll="3m")


scroll_id = result['_scroll_id']
scroll_size = result["hits"]["total"]
counter = 0
print('total items= ' + scroll_size)

while(scroll_size > 0):
    counter +=len(result['hits']['hits'])
   

    result = es.scroll(scroll_id=scroll_id, scroll="1s")
    scroll_id = result['_scroll_id']

    
print('found = ' +counter)

問題是有時counter (程序結束時結果的總和)小於result["hits"]["total"] 這是為什么? 為什么scroll不遍歷所有結果?

ElasticSearch version : 5.6
lucence version :6.6

如果我沒記錯的話,您將在while循環的第一次迭代中將初始result["hits"]["total"]添加到您的counter中——但您應該只添加檢索到的命中的長度:

scroll_id = result['_scroll_id']
total = result["hits"]["total"]

print('total = %d' % total)

scroll_size = len(result["hits"]["hits"])  # this is the current 'page' size
counter = 0

while(scroll_size > 0):
    counter += scroll_size

    result = es.scroll(scroll_id=scroll_id, scroll="1s")
    scroll_id = result['_scroll_id']
    scroll_size = len(result['hits']['hits'])

print('counter = %d' % counter)
assert counter == total

事實上,您不需要單獨存儲滾動大小——更簡潔的while循環是:

while len(result['hits']['hits']):
    counter += len(result['hits']['hits'])

    result = es.scroll(scroll_id=scroll_id, scroll="1s")
    scroll_id = result['_scroll_id']

因為第一次迭代有 10K(通常默認) ,就像這里一樣。 你錯過了: result["hits"]["hits"] chunk

你應該試試:

counter +=len(result['hits']['hits'])

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM