简体   繁体   English

elasticsearch批量方法使用字母数字ID失败

[英]elasticsearch bulk method fails with alpha-numeric id

I can import data from pandas dataframe to elasticsearch using the following code. 我可以使用以下代码将数据从熊猫数据框导入elasticsearch。 I simply need to add an id column with auto-generated serial number. 我只需要添加带有自动生成的序列号的id列。 But can I use messageid column as id? 但是我可以使用messageid列作为ID吗?

# message id looks like nucb-9a7ff0885b95efae
df["id"] = [x for x in range(len(df["messageid"])) ]

# the above statement works but the following does not
#df["id"] = df["messageid"]

tmp = df.to_json(orient = "records")
df_json= json.loads(tmp)
import elasticsearch
es = elasticsearch.Elasticsearch('https://some_site.com')

for id in df_json:
    es.index(index='fromdf', doc_type='mydf', body=id)

id in elasticsearch need not be numeric. elasticsearch中的id不必为数字。 But while using python, I get an error 但是在使用python时,出现错误

RequestError: TransportError(400, u'MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: "nucb-a006fd8dd60ac7a6"]; ')

How do I make sure that I can use bulk method with non-numeric ids? 如何确保可以对非数字ID使用批量方法?

In other words, the code should work with 换句话说,该代码应与

df["id"] = df["messageid"]

index method signature: 索引方法签名:

def index(self, index, doc_type, body, id=None, params=None):
...
    :arg index: The name of the index
    :arg doc_type: The type of the document
    :arg body: The document
    :arg id: Document ID
...

so your data should go to body and identifier that identifies your data should go to id. 因此您的数据应转到正文,而标识您数据的标识符应转到id。 If you want to store messages that are identified by mesageid you could do so like: 如果要存储由mesagedid标识的消息,可以这样:

for row_dict in df_json:
    es.index(index='fromdf', doc_type='mydf', body=row_dict, id=row_dict['messageid'])

You could also greatly simplify your code by using already defined functions like pandas.DataFrame.to_dict so that you don't have to convert to json and load json just to get dictionary. 您还可以通过使用已定义的函数(例如pandas.DataFrame.to_dict)来极大地简化代码,从而不必为了获取字典而转换为json并加载json。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM