简体   繁体   English

使用python api在弹性搜索中转储批量数据

[英]dump bulk data in elastic search using python api

i want to index shakespeare data in elastic search using its python api.我想使用其 python api 在弹性搜索中索引莎士比亚数据。 I am getting below error.我得到以下错误。

    PUT http://localhost:9200/shakes/play/3 [status:400 request:0.098s]
{'error': {'root_cause': [{'type': 'mapper_parsing_exception', 'reason': 'failed to parse'}], 'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'not_x_content_exception', 'reason': 'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes'}}, 'status': 400}

python script蟒蛇脚本

from elasticsearch import Elasticsearch
from elasticsearch import TransportError
import json

data = []

for line in open('shakespeare.json', 'r'):
    data.append(json.loads(line))

es = Elasticsearch()

res = 0
cl = []
# filtering data which i need
for d in data:
    if res == 0:
        res = 1 
        continue
    cl.append(data[res])
    res = 0

try:
    res = es.index(index = "shakes", doc_type = "play", id = 3, body = cl)
    print(res)
except TransportError as e:
    print(e.info)

I also tried using json.dumps but still getting same error.我也尝试使用 json.dumps 但仍然遇到相同的错误。 But when add just one element of list to elastic search below code works.但是当在下面的代码中只添加一个列表元素到弹性搜索时。

You are not sending a bulk request to es, but only a simple create request -please take a look here .您不是向 es 发送批量请求,而是向 es 发送一个简单的创建请求 - 请看这里 This method works with a dict that represent a new doc, and not with a list of docs.此方法适用于表示新文档的 dict,而不适用于文档列表。 If you put an id on the create request, then you need to make this value dynamic, otherwise every doc will be overwritten on the id of the last doc indicized.如果您在创建请求上放置了一个 id,那么您需要使这个值动态化,否则每个 doc 都将被覆盖在最后一个 doc 的 id 上。 If in your json , you have a record for each line you should try this -Please read here for bulk documentation:如果在您的 json 中,每行都有一条记录,您应该尝试此操作 - 请在此处阅读以获取批量文档:

  from elasticsearch import helpers

es = Elasticsearch()
op_list = []
with open("C:\ElasticSearch\shakespeare.json") as json_file:
    for record in json_file:
        op_list.append({
                       '_op_type': 'index',
                       '_index': 'shakes',
                       '_type': 'play',
                       '_source': record
                     })
helpers.bulk(client=es, actions=op_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM