[英]Sort a Huge Number of Documents in Elasticsearch
当我想从elasticsearch索引中检索大量文档时,我总是使用elasticsearch的扫描和滚动技术( http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html ) 如下:
conn = Elasticsearch( hosts = HOSTS )
the_query = { 'query': { 'match_all': { } }, 'sort': { 'created_at': { 'order': 'asc' } } } # would like sort the documents according to the 'created_at' date
scanResp = conn.search( index=TARGET_INDEX, doc_type=TARGET_DOC_TYPE, body=the_query, search_type='scan', scroll='10m' )
scrollId = scanResp['_scroll_id']
doc_num = 1
response = conn.scroll( scroll_id = scrollId, scroll='10m' )
while ( len( response['hits']['hits'] ) > 0 ):
for item in response['hits']['hits']:
print '\tDocument ' + str(doc_num) + ' of ' + str( response['hits']['total'] )
doc_num += 1
# ====================
# Process the item
# ====================
the_doc = item['_source']
# end for item
scrollId = response['_scroll_id']
if doc_num >= response['hits']['total']:
break
response = conn.scroll( scroll_id = scrollId, scroll='10m' )
# end of while
但是,正如elasticsearch文档中提到的那样,检索到的文档将不会排序,因此结果不是我想要的。
我的问题:如何在Elasticsearch中排序大量文档?
谢谢 :)
遍历排序列表时滚动非常昂贵,但是如果您坚持要从查询中删除“ scan” search_type。 滚动时,scan禁用排序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.