简体   繁体   中英

Bulk Index data in Elasticsearch with sequential IDs

I am using this code to bulk index all data in Elasticsearch using python:

from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json

es = Elasticsearch()   

def load_json(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')

I know that if ID is not mentioned ES gives a 20 character long ID by itself, but I want it to get indexed starting from ID = 1 till the number of documents.

How can I achieve this ?

In elastic search if you don't pick and ID for your document an ID is automatically created for you, check here in elastic docs :

Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID 
strings. These GUIDs are generated from a modified FlakeID scheme which 
allows multiple nodes to be generating unique IDs in parallel with 
essentially zero chance of collision.

If you like to have custom ids you need to build them yourself, using similar syntax:

[
    {'_id': 1,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}

    },
    {'_id': 2,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}
    }
]

helpers.bulk(es, load_json(sys.argv[1])

Since you are decalring the type and index inside your schema you don't have to do it inside helpers.bulk() method. You need to change the output of 'load_json' to create list with dicts (like above) to be saved in es ( python elastic client docs )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM