[英]how to load 1000 lines of a csv into elasticsearch as 1000 different documents using elasticsearch API
所以我一直在嘗試將 csv 的 1000 行作為 1000 個不同的文檔加載到 elasticsearchhc 中,csv 有 8 個標題:telease 年份、標題、起源/種族、導演、演員、wiki 頁面、plot。我當前加載數據集的代碼它使用助手的批量命令
import csv
from elasticsearch import helpers, Elasticsearch
es = Elasticsearch("http://localhost:9200")
es.indices.delete(index='movie-plots', ignore=[400, 404])
es.indices.create(index='movie-plots', body=body)
filename = 'wiki_movie_plots_deduped.csv'
def csv_reader(file_name):
with open(file_name, 'r') as outfile:
reader = csv.DictReader(outfile)
helpers.bulk(es, reader, index="movie-plots", doc_type="_doc")
我認為這會將 1000 行加載到一個文檔中。
您走在正確的道路上,下面的代碼會將 csv 拆分為 1000 個不同的項目,但會拆分標題並將每個行項目轉換為具有適當標題的地圖/字典項目。 然后將其附加到列表中,以便您上傳字典項目列表。
import csv, sys
from elasticsearch import helpers, Elasticsearch, RequestsHttpConnection
es = Elasticsearch(
hosts=[{
'host': 'localhost',
'port': '9200'}],
use_ssl=False,
verify_certs=True,
connection_class=RequestsHttpConnection
)
upload_list = [] # list of items for upload
# Load all csv data
with open('my_folder/my_csv_file.csv', newline='') as csvfile:
data_list = []
csv_data = csv.reader(csvfile)
for row in csv_data:
data_list.append(row)
# separate out the headers from the main data
headers = data_list[0]
# drop headers from data_list
data_list.pop(0)
for item in data_list: # iterate over each row/item in the csv
item_dict = {}
# match a column header to the row data for an item
i = 0
for header in headers:
item_dict[header] = item[i]
i = i+1
# add the transformed item/row to a list of dicts
upload_list += [item_dict]
# using helper library's Bulk API to index list of Elasticsearch docs
try:
resp = helpers.bulk(
es,
upload_list,
index="my-index-name"
)
msg = "helpers.bulk() RESPONSE: " + str(resp)
print(msg) # print the response returned by Elasticsearch
except Exception as err:
msg = "Elasticsearch helpers.bulk() ERROR: " + str(err)
print(msg)
sys.exit(1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.