简体   繁体   中英

Transfer csv to elasticsearch from Python with document_id as csv field

wanted to transfer following csv to elsticsearch

|hcode|hname|
|1|aaaa|
|2|bbbbb|
|3|ccccc|
|4|dddd|
|5|eeee|
|6|ffff|

and need to insert hcode field as document_id. getting below error

  File "C:\Users\Namali\Anaconda3\lib\site-packages\elasticsearch\connection\base.py", line 181, in _raise_error
    status_code, error_message, additional_info

RequestError: RequestError(400, 'mapper_parsing_exception', 'failed to parse')"

use elasticseach version is 7.1.1 and python vervion is 3.7.6 Python code-----------------------------------------------------------------

import csv
import json

from elasticsearch import Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

def csv_reader(file_obj, delimiter=','):
   reader_ = csv.reader(file_obj,delimiter=delimiter,quotechar='"')
   
   i = 1
   results = []
   for row in reader_:
    #try :
    #es.index(index='hb_hotel_raw', doc_type='hb_hotel_raw', id=row[0], 
                # body=json.dump([row for row in reader_], file_obj))
    es.index(index='test', doc_type='test', id=row[0],body=json.dumps(row))
    #except:
    #    print("error")
    i = i + 1
    results.append(row)
    print(row)

if __name__ == "__main__":
  with open("D:\\namali\\rez\\data_mapping\\test.csv") as f_obj:
    csv_reader(f_obj)

First, doc_type is omitted in the elasticsearch 7. Second, you need to pass a valid json to elasticsearch. I edited your code as below:

for row in reader_:
    _id = row[0].split("|")[1]
    text = row[0].split("|")[2]
    my_dict = {"hname" : text}
    es.index(index='test', id=_id, body=my_dict)

<disclosure: I'm the developer of Eland and employed by Elastic>

If you're willing to load the CSV into a Pandas DataFrame you can use Eland to create/append the tabular data to an Elasticsearch index with all data types resolved properly.

I would recommend reading pandas.read_csv() and eland.pandas_to_eland() function documentation for ideas on how to accomplish this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM