简体   繁体   English

如何使用 python 客户端获取 elasticsearch 索引下的所有文档?

[英]How to get all documents under an elasticsearch index with python client ?

I'm trying to get all index document using python client but the result show me only the first document This is my python code:我正在尝试使用 python 客户端获取所有索引文档,但结果只显示第一个文档这是我的 python 代码:

res = es.search(index="92c603b3-8173-4d7a-9aca-f8c115ff5a18", doc_type="doc", body = {
'size' : 10000,
'query': {
    'match_all' : {}
}
})
print("%d documents found" % res['hits']['total'])
data = [doc for doc in res['hits']['hits']]
for doc in data:
    print(doc)
    return "%s %s %s" % (doc['_id'], doc['_source']['0'], doc['_source']['5'])

Elasticsearch by default retrieve only 10 documents. Elasticsearch 默认只检索 10 个文档。 You could change this behaviour - doc here .您可以在此处更改此行为 - doc The best practice for pagination are search after query and scroll query .分页的最佳实践是search after queryscroll query It depends from your needs.这取决于您的需求。 Please read this answer Elastic search not giving data with big number for page size请阅读此答案弹性搜索不提供大数字的页面大小数据

To show all the results:要显示所有结果:

for doc in res['hits']['hits']:
    print doc['_id'], doc['_source']

try "_doc" instead of "doc"尝试“_doc”而不是“doc”

res = es.search(index="92c603b3-8173-4d7a-9aca-f8c115ff5a18", doc_type="_doc", body = {
'size' : 100,
'query': {
    'match_all' : {}
}
})

You can try the following query.您可以尝试以下查询。 It will return all the documents.它将返回所有文件。

result = es.search(index="index_name", body={"query":{"match_all":{}}})

You can also use elasticsearch_dsl and its Search API which allows you to iterate over all your documents via the scan method.您还可以使用elasticsearch_dsl及其搜索 API,它允许您通过scan方法迭代所有文档。

import elasticsearch
from elasticsearch_dsl import Search

client = elasticsearch.Elasticsearch()
search = Search(using=client, index="92c603b3-8173-4d7a-9aca-f8c115ff5a18")

for hit in search.scan():
    print(hit)

I dont see mentioned that the index must be refreshed if you just added data.我没有看到提到如果您刚刚添加数据就必须刷新索引。 Use this:用这个:

es.indices.refresh(index="index_name")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM