批量索引/用 elasticsearch 为 python 创建文档

Question

I am generating a large number of elasticsearch documents with random content using python and index them with elasticsearch-py .我正在使用 python 生成大量具有随机内容的 elasticsearch 文档，并使用 elasticsearch -py 对它们进行索引。

Simplified working example (document with just one field):简化的工作示例（只有一个字段的文档）：

from elasticsearch import Elasticsearch
from random import getrandbits

es_client = Elasticsearch('https://elastic.host:9200')

for i in range(1,10000000):
    document = {'my_field': getrandbits(64)}
    es_client.index(index='my_index', document=document)

Since this makes one request per document, I tried to speed it up by sending chunks of 1000 documents each using the _bulk API. However, my attempts so far have been unsuccessful.由于这对每个文档发出一个请求，我尝试通过使用_bulk API 发送 1000 个文档的块来加快它的速度。但是，到目前为止我的尝试没有成功。

My understanding from the docs is that you can pass an iterable to bulk() , so I tried:我从文档中了解到，您可以将可迭代对象传递给bulk() ，所以我尝试了：

from elasticsearch import Elasticsearch
from random import getrandbits

es_client = Elasticsearch('https://elastic.host:9200')

document_list = []
for i in range(1,10000000):
    document = {'my_field': getrandbits(64)}
    document_list.append(document)
    if i % 1000 == 0:
        es_client.bulk(operations=document_list, index='my_index')
        document_list = []

but this results in a但这导致

elasticsearch.BadRequestError: BadRequestError(400, 'illegal_argument_exception', 'Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]') elasticsearch.BadRequestError: BadRequestError(400, 'illegal_argument_exception', '格式错误的操作/元数据行 [1]，应为 START_OBJECT 或 END_OBJECT 但已找到 [VALUE_STRING]')

Answer 1

Ok, seems I have mixed up two different functions: helpers.bulk() and Elasticsearch.bulk() .好的，我似乎混淆了两个不同的函数： helpers.bulk()和Elasticsearch.bulk() 。 Either can be used to achieve what I intended to do, but they have a slightly different signature.两者都可以用来实现我打算做的事情，但它们的签名略有不同。

The helpers.bulk() function takes an Elasticsearch() object and an iterable containing the documents as parameters. helpers.bulk() function 采用Elasticsearch() object 和一个包含文档作为参数的可迭代对象。 The operation can be specified as _op_type and can be one of index , create , delete , or update .该操作可以指定为_op_type并且可以是index 、 create 、 delete或update之一。 Since _op_type defaults to index , we can just omit it and simply pass the list of documents in this case:由于_op_type默认为index ，我们可以省略它并在这种情况下简单地传递文档列表：

from elasticsearch import Elasticsearch, helpers
from random import getrandbits

es_client = Elasticsearch('https://elastic.host:9200')

document_list = []
for i in range(1,10000000):
    document = {'my_field': getrandbits(64)}
    document_list.append(document)
    if i % 1000 == 0:
        helpers.bulk(es_client, document_list, index='my_index')
        document_list = []

This works fine.这很好用。

The Elasticsearch.bulk() function can be used alternatively, but the actions/operations are mandatory as part of the iterable here and the syntax is slightly different. Elasticsearch.bulk() function 可以替代使用，但动作/操作是强制性的，作为此处可迭代的一部分，语法略有不同。 This means that instead of just a dict with the document contents, we need to have a dict specifying both the action (in this case "index": {} ), as well as the body for each document.这意味着我们需要有一个指定操作（在本例中"index": {} ）以及每个文档的正文的dict ，而不仅仅是包含文档内容的dict 。 See also _bulk documentation :另请参阅_bulk文档：

from elasticsearch import Elasticsearch
from random import getrandbits

es_client = Elasticsearch('https://elastic.host:9200')

actions_list = []
for i in range(1,10000000):
    document = {'my_field': getrandbits(64)}
    actions_list.append({"index": {}, "doc": document})
    if i % 1000 == 0:
        es_client.bulk(operations=actions_list, index='my_index')
        actions_list = []

This works fine as well.这也很好用。

I assume that both of the above generate the same _bulk REST API statement internally, so they should be equivalent in the end.我假设以上两个在内部生成相同的_bulk REST API 语句，所以它们最终应该是等价的。

批量索引/用 elasticsearch 为 python 创建文档

问题描述

1 个解决方案

解决方案1
3 已采纳 2022-04-15 21:21:41

批量索引/用 elasticsearch 为 python 创建文档

问题描述

1 个解决方案

解决方案1 3 已采纳 2022-04-15 21:21:41

解决方案1
3 已采纳 2022-04-15 21:21:41