简体   繁体   English

Elasticsearch中具有顺序ID的批量索引数据

[英]Bulk Index data in Elasticsearch with sequential IDs

I am using this code to bulk index all data in Elasticsearch using python: 我正在使用此代码使用python在Elasticsearch中批量索引所有数据:

from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json

es = Elasticsearch()   

def load_json(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')

I know that if ID is not mentioned ES gives a 20 character long ID by itself, but I want it to get indexed starting from ID = 1 till the number of documents. 我知道,如果没有提到ID,ES本身会给出一个20个字符长的ID,但是我希望它从ID = 1开始直到文档数被索引。

How can I achieve this ? 我该如何实现?

In elastic search if you don't pick and ID for your document an ID is automatically created for you, check here in elastic docs : 在弹性的搜索,如果你不挑ID为您的文档的ID将自动为您创建,检查这里的弹性文档

Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID 
strings. These GUIDs are generated from a modified FlakeID scheme which 
allows multiple nodes to be generating unique IDs in parallel with 
essentially zero chance of collision.

If you like to have custom ids you need to build them yourself, using similar syntax: 如果您想拥有自定义ID,则需要使用类似的语法自行构建它们:

[
    {'_id': 1,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}

    },
    {'_id': 2,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}
    }
]

helpers.bulk(es, load_json(sys.argv[1])

Since you are decalring the type and index inside your schema you don't have to do it inside helpers.bulk() method. 由于您要在schematypeindex进行贴图,因此不必在helpers.bulk()方法中进行操作。 You need to change the output of 'load_json' to create list with dicts (like above) to be saved in es ( python elastic client docs ) 您需要更改'load_json'的输出以创建包含要保存在es中的字典(如上)的列表( python elastic client docs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM