简体   繁体   English

如何在 Elasticsearch 中批量插入忽略过程中可能发生的所有错误?

[英]How to bulk Insert in Elasticsearch ignoring all errors that may occur in the process?

I'm using Elasticsearch version 6.8.我使用的是Elasticsearch 6.8 版。

I need to insert ~10000 docs (from csv file) into existing and mapped index.我需要将 ~10000 个文档(来自 csv 文件)插入现有和映射的索引中。

I'm using python (version 3.7) code:我正在使用python (3.7 版)代码:

    import csv  
    es = Elasticsearch();
    from elasticsearch import helpers
    with open(file_path) as f:
        reader = csv.DictReader(f)
        helpers.bulk(es, reader, index=index_name, doc_type=doc_type)

But I'm getting error:但我收到错误:

raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('3 document(s) failed to index.'

The error occur because some values in the csv file has string value instead of float value.发生错误是因为 csv 文件中的某些值具有字符串值而不是浮点值。

The bulk stop after 499 documents, and the application crash. 499个文件后批量停止,应用程序崩溃。

Is there a way to bulk all the documents (~10000) with bulk and if there are errors (due to mapping or wrong values) tell python / elastic to ignore those documents and continue with the bulk operation ?有没有办法批量批量处理所有文档(~10000),如果有错误(由于映射或错误值)告诉python / elastic忽略这些文档并继续批量操作?

You can set the arg raise_on_error to False since it's True by default as suggested in Python bulk documentation .您可以将 arg raise_on_error设置为False因为它默认为True ,如Python 批量文档中所建议的。 It should look like this:它应该是这样的:

helpers.bulk(es, reader, index=index_name, doc_type=doc_type, raise_on_error=False)

Keep in mind that:请记住:

When errors are being collected original document data is included in the error dictionary which can lead to an extra high memory usage.收集错误时,原始文档数据包含在错误字典中,这会导致内存使用率过高。 If you need to process a lot of data and want to ignore/collect errors please consider using the streaming_bulk() helper which will just return the errors and not store them in memory.如果您需要处理大量数据并希望忽略/收集错误,请考虑使用streaming_bulk()助手,它只会返回错误而不是将它们存储在内存中。

You can also take a look at examples 12, 25 and 39 in this Python ES bulk Examples您还可以查看此Python ES 批量示例中的示例 12、25和 39

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM