简体   繁体   中英

Elasticsearch bulk insert not completely working

Good day,

I have problems with elasticsearch bulk insert. In my program text file is being generated every 15 seconds, and then script below (run as os.popen('python my_script my_text_file') ) tries to insert data into elasticsearch, and renames the file on success.

Each text file has size 1-9 kilobytes and following format:

{'_type': 'a', '_index': 'b', '_source': {'k0': 'v0'}, '_id': 'c0'}
{'_type': 'a', '_index': 'b', '_source': {'k1': 'v1'}, '_id': 'c1'}
{'_type': 'a', '_index': 'b', '_source': {'k2': 'v2'}, '_id': 'c2'}
...
{'_type': 'a', '_index': 'b', '_source': {'kN': 'vN'}, '_id': 'cN'}

My script code below:

import sys
import elasticsearch.helpers
import json
import os

def RetryElastic(data, maxCount =5):
    counter = 0
    while counter<maxCount:
        try:
            res = elasticsearch.helpers.bulk(es,data, max_retries = 10,stats_only = False)
            assert len(res[1])==0
        except Exception as e:
            print(e.__class__,'found') #raise e
            counter+=1
            if counter>=maxCount:
                print(res,'\n', file = open('file.txt','a+'))
        else:
            os.rename(sys.argv[1],sys.argv[1]+"_sent")
            break

es = elasticsearch.Elasticsearch(['host'])
fileInfo= open(sys.argv[1]).read().splitlines()
data = (json.loads(i.replace("'",'"')) for i in fileInfo)
RetryElastic(data)

Part of 'executive' code from elasticsearch.helpers " init .py" (def bulk):

for ok, item in streaming_bulk(client, actions, **kwargs):
        # go through request-reponse pairs and detect failures
        if not ok:
            if not stats_only:
                errors.append(item)
            failed += 1
        else:
            success += 1

    return success, failed if stats_only else errors

My res[1] is errors. I have no error file (=no errors), all my files were renamed to file+"_sent". But when i check the data from elasticsearch, i found NOT ALL data was inserted (most of data is inserted, but some data IS NOT (data from some files is completely missed in elasticsearch) ). And this is in case when i have no errors. Generally i have to insert 100+- few files.

Where is my fault?

It's possible that you have IDs that are colliding causing Elasticsearch to not save all the data. Make sure all of your IDs are unique.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM