简体   繁体   English

使用Python将数据帧索引到Elasticsearch中

[英]Indexing dataframe into elasticsearch using Python

I'm trying to indexing some pandas dataframe into ElasticSearch. 我正在尝试将一些熊猫数据帧索引到ElasticSearch中。 I have some troubles while parsing the json that I'm generating. 在解析生成的json时遇到一些麻烦。 I think that my problem is coming from the mapping. 我认为我的问题来自映射。 Please below find my code. 请在下面找到我的代码。

import logging
from pprint import pprint
from elasticsearch import Elasticsearch
import pandas as pd

def create_index(es_object, index_name):
    created = False
    # index settings
    settings = {
        "settings": {
            "number_of_shards": 1,
            "number_of_replicas": 0
        },
        "mappings": {
            "danger": {
                "dynamic": "strict",
                "properties": {
                    "name": {
                       "type": "text"
                    },
                    "first_name": {
                        "type": "text"
                    },
                    "age": {
                        "type": "integer"
                    },
                    "city": {
                        "type": "text"
                    },
                    "sex": {
                        "type": "text",
                    },
                }
            }
        }
    }

    try:
        if not es_object.indices.exists(index_name):
            #Ignore 400means to ignore "Index Already Exist" error
            es_object.indices.create(index=index_name, ignore=400,     
body=settings)
            print('Created Index')
        created = True
    except Exception as ex:
        print(str(ex))
    finally:
        return created


def store_record(elastic_object, index_name, record):
    is_stored = True
    try:
        outcome = elastic_object.index(index=index_name,doc_type='danger', body=record)
        print(outcome)
    except Exception as ex:
        print('Error in indexing data')


data = [['Hook', 'James','90', 'Austin','M'],['Sparrow','Jack','15', 'Paris', 'M'],['Kent','Clark','13', 'NYC', 'M'],['Montana','Hannah','28','Las Vegas', 'F'] ]
df = pd.DataFrame(data,columns=['name', 'first_name', 'age', 'city', 'sex'])
result = df.to_json(orient='records')
result = result[1:-1]
es = Elasticsearch()
if es is not None:
        if create_index(es, 'cracra'):
            out = store_record(es, 'cracra', result)
            print('Data indexed successfully')

I got the following error 我收到以下错误

POST http://localhost:9200/cracra/danger [status:400 request:0.016s]

Error in indexing data
RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Data indexed successfully

I don't know where it is coming from. 我不知道它从哪里来。 If anyone may help me to solve this, I would be grateful. 如果有人可以帮助我解决这个问题,我将不胜感激。

Thanks a lot ! 非常感谢 !

Try to remove extra commas from your mappings: 尝试从映射中删除多余的逗号:

"mappings": {
  "danger": {
    "dynamic": "strict",
    "properties": {
      "name": {
        "type": "text"
      },
      first_name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "city": {
        "type": "text"
      },
      "sex": {
        "type": "text", <-- here
      }, <-- and here
    }
  }
}

UPDATE UPDATE

It seems that the index is created successfully and the problem is in data indexing. 似乎索引创建成功,问题出在数据索引上。 As Nishant Saini noted you probably are trying to index several documents at a time. 正如Nishant Saini指出的那样,您可能正在尝试一次索引多个文档。 It can be done using Bulk API . 可以使用Bulk API来完成。 Here is the example of correct request that indexes two documents: 这是索引两个文档的正确请求的示例:

POST cracra/danger/_bulk
{"index": {"_id": 1}}
{"name": "Hook", "first_name": "James", "age": "90", "city": "Austin", "sex": "M"}
{"index": {"_id": 2}}
{"name": "Sparrow", "first_name": "Jack", "age": "15", "city": "Paris", "sex": "M"}

Every document in the request body must appear in the new line with some meta information before it. 请求正文中的每个文档都必须在换行之前出现,并带有一些元信息。 In this case metainfo contains only id that must be assigned to the document. 在这种情况下,metainfo仅包含必须分配给文档的ID。

You can either make this query by hand or use Elasticsearch Helpers for Python that can take care of adding correct metainfo. 您可以手动进行此查询,也可以使用适用于Python的Elasticsearch Helpers (可以帮助添加正确的元信息)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM