简体   繁体   中英

Python elasticsearch bulk API is not working as expected

I'm trying to index documents with bulk API by using elasticsearch python package. I'm fetching data from mySQL DB which has around 10000 records. But, my Python bulk api script is only able to upload 5000 records & somewhere in middle it's getting break.

I got this error UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

def new_products(catid):
    connection = get_connection()
    es = get_elastic_connection()
    cursor = connection.cursor()
    catid = int(catid)
    sql = "SELECT  * FROM %s WHERE catid=%d AND product_id<>0 LIMIT %d" % (TABLENAME, catid, LIMIT_PER_THREAD_ON_NEW)

    cursor.execute(sql)
    product_ids_result = cursor.fetchall()
    product_ids_only = map(lambda x: x['product_id'], product_ids_result)
    product_ids_indexes = {}
    for row in product_ids_result:
        product_ids_indexes[row['product_id']] = row['id']

    products_list = []
    if product_ids_only:
        sql = "SELECT * FROM tbl_products WHERE catid=%d AND product_id IN (%s)" % (catid, ','.join(map(str, product_ids_only)))

        cursor.execute(sql)
        products_list = cursor.fetchall()

    while products_list:
        print catid, len(products_list)
        product_ids_from_db = map(lambda x: x['pid'], products_list)
        product_images = get_images(product_ids_from_db)
        product_specs = get_specs(catid, product_ids_from_db)

        bulk_data = []
        for row in products_list:
            row['p_spec'] = {'d_spec': [], 'f_spec': []}
            if row['pid'] in product_specs:
                if product_specs[row['pid']].has_key('d_spec'):
                    row['p_spec']['d_spec'] = product_specs[row['pid']]['d_spec']
                if product_specs[row['pid']].has_key('f_spec'):
                    row['p_spec']['f_spec'] = product_specs[row['pid']]['f_spec']

            if row['pid'] in product_images:
                if product_images[row['pid']]:
                    row['pimg'] = product_images[row['pid']]
                    row['no_img'] = '1'

            bulk_data.append({
                "index": {
                    '_index': ES_INDEX,
                    '_type': ES_TYPE,
                    '_id': row['pid']
                }
            })
            bulk_data.append(row)

            if len(bulk_data) == ES_LIMIT_PER_REQUEST:
                responses = es.bulk(index=ES_INDEX, body=bulk_data, refresh=True)
                bulk_data = []

        if len(bulk_data) > 0:
            responses = es.bulk(index=ES_INDEX, body=bulk_data, refresh=True)


        sql = "SELECT  * FROM %s WHERE catid=%d AND product_id<>0 LIMIT %d" % (TABLENAME, catid, LIMIT_PER_THREAD_ON_NEW)
        cursor.execute(sql)
        new_product_ids_result = cursor.fetchall()
        new_product_ids_only = map(lambda x: x['product_id'], new_product_ids_result)

        if set(product_ids_only) == set(new_product_ids_only):
            print catid, "new products are same"
            break;
        else:
            product_ids_only = new_product_ids_only

        if new_product_ids_only:
            sql = "SELECT * FROM tbl_products WHERE catid=%d AND product_id IN (%s)" % (catid, ','.join(map(str, new_product_ids_only)))

            cursor.execute(sql)
            products_list = cursor.fetchall()
        else:
            products_list = []

    connection.close()

Any clue what's going wrong at here.

Regards

I got the issue.

Actually, I was trying index data with multithreading. Because of this, i didn't get any error while running.

Finally, I had fixed by passing charset and use_unicode as parameter in mysqldb.connect function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM