Bulk Index data in Elasticsearch with sequential IDs

Question

I am using this code to bulk index all data in Elasticsearch using python:

from elasticsearch import Elasticsearch, helpers
import json
import os
import sys
import sys, json

es = Elasticsearch()   

def load_json(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.json'):
            with open(filename,'r') as open_file:
                yield json.load(open_file)

helpers.bulk(es, load_json(sys.argv[1]), index='v1_resume', doc_type='candidate')

I know that if ID is not mentioned ES gives a 20 character long ID by itself, but I want it to get indexed starting from ID = 1 till the number of documents.

How can I achieve this ?

Answer 1

In elastic search if you don't pick and ID for your document an ID is automatically created for you, check here in elastic docs :

Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID 
strings. These GUIDs are generated from a modified FlakeID scheme which 
allows multiple nodes to be generating unique IDs in parallel with 
essentially zero chance of collision.

If you like to have custom ids you need to build them yourself, using similar syntax:

[
    {'_id': 1,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}

    },
    {'_id': 2,
     '_index': 'index-name',
     '_type': 'document',
     '_source': {
          "title": "Hello World!",
          "body": "..."}
    }
]

helpers.bulk(es, load_json(sys.argv[1])

Since you are decalring the type and index inside your schema you don't have to do it inside helpers.bulk() method. You need to change the output of 'load_json' to create list with dicts (like above) to be saved in es ( python elastic client docs )

Bulk Index data in Elasticsearch with sequential IDs

Question

1 answers

solution1
0 2017-05-16 12:26:04

Bulk Index data in Elasticsearch with sequential IDs

Question

1 answers

solution1 0 2017-05-16 12:26:04

solution1
0 2017-05-16 12:26:04