簡體   English   中英

如何使用 elasticsearch 將 csv 的 1000 行加載到 elasticsearch 作為 1000 個不同的文檔 API

[英]how to load 1000 lines of a csv into elasticsearch as 1000 different documents using elasticsearch API

所以我一直在嘗試將 csv 的 1000 行作為 1000 個不同的文檔加載到 elasticsearchhc 中,csv 有 8 個標題:telease 年份、標題、起源/種族、導演、演員、wiki 頁面、plot。我當前加載數據集的代碼它使用助手的批量命令

import csv
from elasticsearch import helpers, Elasticsearch
es = Elasticsearch("http://localhost:9200")

es.indices.delete(index='movie-plots', ignore=[400, 404])
es.indices.create(index='movie-plots', body=body) 

filename = 'wiki_movie_plots_deduped.csv'

def csv_reader(file_name):
    with open(file_name, 'r') as outfile:
        reader = csv.DictReader(outfile)
        helpers.bulk(es, reader, index="movie-plots", doc_type="_doc")

我認為這會將 1000 行加載到一個文檔中。

您走在正確的道路上,下面的代碼會將 csv 拆分為 1000 個不同的項目,但會拆分標題並將每個行項目轉換為具有適當標題的地圖/字典項目。 然后將其附加到列表中,以便您上傳字典項目列表。

import csv, sys
from elasticsearch import helpers, Elasticsearch, RequestsHttpConnection

es = Elasticsearch(
    hosts=[{
        'host': 'localhost',
        'port': '9200'}],
    use_ssl=False,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

upload_list = [] # list of items for upload

# Load all csv data
with open('my_folder/my_csv_file.csv', newline='') as csvfile:
    
    data_list = []

    csv_data = csv.reader(csvfile)
    for row in csv_data:
        data_list.append(row)

    # separate out the headers from the main data 
    headers = data_list[0]
    # drop headers from data_list
    data_list.pop(0)

    for item in data_list: # iterate over each row/item in the csv

        item_dict = {}

        # match a column header to the row data for an item
        i = 0
        for header in headers:
            item_dict[header] = item[i]
            i = i+1

        # add the transformed item/row to a list of dicts
        upload_list += [item_dict]

# using helper library's Bulk API to index list of Elasticsearch docs
try:
    resp = helpers.bulk(
        es,
        upload_list,
        index="my-index-name"
    )
    msg = "helpers.bulk() RESPONSE: " + str(resp)
    print(msg) # print the response returned by Elasticsearch
except Exception as err:
    msg = "Elasticsearch helpers.bulk() ERROR: " + str(err)
    print(msg)
    sys.exit(1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM