Elastic Search 检索所有记录

Question

I am using elastic search as a database which has millions of records.我使用弹性搜索作为拥有数百万条记录的数据库。 I am using the below code to retrieve the data but it is not giving me complete data.我正在使用下面的代码来检索数据，但它没有给我完整的数据。

response = requests.get(http://localhost:9200/cityindex/_search?q= : &size=10000)响应 = requests.get(http://localhost:9200/cityindex/_search?q= : &size=10000)

This is giving me only 10000 records.这只给了我 10000 条记录。

when I am extending the size to the size of doc count(which is 784234) it's throwing an error.当我将大小扩展到文档计数的大小（即 784234）时，它会引发错误。

'Result window is too large, from + size must be less than or equal to: [10000] but was [100000]. '结果窗口太大，从+大小必须小于或等于：[10000]但为[100000]。 See the scroll API for a more efficient way to request large data sets.有关请求大型数据集的更有效方法，请参阅滚动 API。 This limit can be set by changing the [index.max_result_window] index level setting.'}]可以通过更改 [index.max_result_window] 索引级别设置来设置此限制。'}]

Context what I want to do.上下文我想做什么。 I want to extract all the data of a particular index and then do the analysis on that(I am looking to get the whole data in JSON format).我想提取特定索引的所有数据，然后对其进行分析（我希望以 JSON 格式获取整个数据）。 I am using python for my project.我正在为我的项目使用 python。 Can someone please help me with this?有人可以帮我吗？

Answer 1

You need to scroll over pages ES returns to you and store them into a list/array.您需要滚动 ES 返回给您的页面并将它们存储到列表/数组中。 You can use elastic search library for the same example python code您可以将弹性搜索库用于相同的示例 python 代码

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts="localhost", port=9200, timeout=30)

page = es.search(
    index = 'index_name',
    scroll = '5m',
    search_type = 'scan',
    size = 5000)

sid = page['_scroll_id']
scroll_size = page['hits']['total']
print scroll_size
records = []
while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    for rec in page['hits']['hits']:
        ele = rec['_source']
        records.append(ele)

Elastic Search 检索所有记录

问题描述

1 个解决方案

解决方案1
0 2022-06-14 14:59:54

Elastic Search 检索所有记录

问题描述

1 个解决方案

解决方案1 0 2022-06-14 14:59:54

解决方案1
0 2022-06-14 14:59:54