[英]How to bulk Insert in Elasticsearch ignoring all errors that may occur in the process?
I'm using Elasticsearch
version 6.8.我使用的是
Elasticsearch
6.8 版。
I need to insert ~10000 docs (from csv file) into existing and mapped index.我需要将 ~10000 个文档(来自 csv 文件)插入现有和映射的索引中。
I'm using python
(version 3.7) code:我正在使用
python
(3.7 版)代码:
import csv
es = Elasticsearch();
from elasticsearch import helpers
with open(file_path) as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index=index_name, doc_type=doc_type)
But I'm getting error:但我收到错误:
raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('3 document(s) failed to index.'
The error occur because some values in the csv file has string value instead of float value.发生错误是因为 csv 文件中的某些值具有字符串值而不是浮点值。
The bulk stop after 499 documents, and the application crash. 499个文件后批量停止,应用程序崩溃。
Is there a way to bulk all the documents (~10000) with bulk and if there are errors (due to mapping or wrong values) tell python
/ elastic
to ignore those documents and continue with the bulk operation ?有没有办法批量批量处理所有文档(~10000),如果有错误(由于映射或错误值)告诉
python
/ elastic
忽略这些文档并继续批量操作?
You can set the arg raise_on_error
to False
since it's True
by default as suggested in Python bulk documentation .您可以将 arg
raise_on_error
设置为False
因为它默认为True
,如Python 批量文档中所建议的。 It should look like this:它应该是这样的:
helpers.bulk(es, reader, index=index_name, doc_type=doc_type, raise_on_error=False)
Keep in mind that:请记住:
When errors are being collected original document data is included in the error dictionary which can lead to an extra high memory usage.
收集错误时,原始文档数据包含在错误字典中,这会导致内存使用率过高。 If you need to process a lot of data and want to ignore/collect errors please consider using the streaming_bulk() helper which will just return the errors and not store them in memory.
如果您需要处理大量数据并希望忽略/收集错误,请考虑使用streaming_bulk()助手,它只会返回错误而不是将它们存储在内存中。
You can also take a look at examples 12, 25 and 39 in this Python ES bulk Examples您还可以查看此Python ES 批量示例中的示例 12、25和 39
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.