[英]Scroll in python Elasticsearch not working
當我查詢 Elasticsearch 時,我嘗試使用 python 滾動所有文檔,以便獲得超過 10K 的結果:
from elasticsearch import Elasticsearch
es = Elasticsearch(ADDRESS, port=PORT)
result = es.search(
index="INDEX",
body=es_query,
size=10000,
scroll="3m")
scroll_id = result['_scroll_id']
scroll_size = result["hits"]["total"]
counter = 0
print('total items= ' + scroll_size)
while(scroll_size > 0):
counter +=len(result['hits']['hits'])
result = es.scroll(scroll_id=scroll_id, scroll="1s")
scroll_id = result['_scroll_id']
print('found = ' +counter)
問題是有時counter
(程序結束時結果的總和)小於result["hits"]["total"]
。 這是為什么? 為什么scroll
不遍歷所有結果?
ElasticSearch version : 5.6
lucence version :6.6
如果我沒記錯的話,您將在while
循環的第一次迭代中將初始result["hits"]["total"]
添加到您的counter
中——但您應該只添加檢索到的命中的長度:
scroll_id = result['_scroll_id']
total = result["hits"]["total"]
print('total = %d' % total)
scroll_size = len(result["hits"]["hits"]) # this is the current 'page' size
counter = 0
while(scroll_size > 0):
counter += scroll_size
result = es.scroll(scroll_id=scroll_id, scroll="1s")
scroll_id = result['_scroll_id']
scroll_size = len(result['hits']['hits'])
print('counter = %d' % counter)
assert counter == total
事實上,您不需要單獨存儲滾動大小——更簡潔的while
循環是:
while len(result['hits']['hits']):
counter += len(result['hits']['hits'])
result = es.scroll(scroll_id=scroll_id, scroll="1s")
scroll_id = result['_scroll_id']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.