简体   繁体   中英

How to bulk Insert in Elasticsearch ignoring all errors that may occur in the process?

I'm using Elasticsearch version 6.8.

I need to insert ~10000 docs (from csv file) into existing and mapped index.

I'm using python (version 3.7) code:

    import csv  
    es = Elasticsearch();
    from elasticsearch import helpers
    with open(file_path) as f:
        reader = csv.DictReader(f)
        helpers.bulk(es, reader, index=index_name, doc_type=doc_type)

But I'm getting error:

raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('3 document(s) failed to index.'

The error occur because some values in the csv file has string value instead of float value.

The bulk stop after 499 documents, and the application crash.

Is there a way to bulk all the documents (~10000) with bulk and if there are errors (due to mapping or wrong values) tell python / elastic to ignore those documents and continue with the bulk operation ?

You can set the arg raise_on_error to False since it's True by default as suggested in Python bulk documentation . It should look like this:

helpers.bulk(es, reader, index=index_name, doc_type=doc_type, raise_on_error=False)

Keep in mind that:

When errors are being collected original document data is included in the error dictionary which can lead to an extra high memory usage. If you need to process a lot of data and want to ignore/collect errors please consider using the streaming_bulk() helper which will just return the errors and not store them in memory.

You can also take a look at examples 12, 25 and 39 in this Python ES bulk Examples

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM