简体   繁体   中英

Elasticsearch JSON Bulk Indexing using Python

I have a huge amount of data in a single JSON that I want to get it into Elasticsearch to do some visualizations in Kibana. My JSON currently looks somewhat like this:

[{"field1": "x", "field2": "y"},
{"field1": "w", "field2": "z"}]
...etc

After doing some research, I found that the best way to feed this data to Elasticsearch is using the Bulk API, but first I need to reformat my data to look like this:

{"index":{"_index": "myindex", "type": "entity_type", "_id": 1}}
{"field1": "x", "field2": "y"}
{"index":{"_index": "myindex", "type": "entity_type", "_id": 2}}
{"field1": "w", "field2": "z"}
...etc

And then I have to post this file using curl.

All of this is part of a bigger Python project so I would like to know the best way to do the reformatting of my data and how to get it into Elasticsearch using Python. I've thought of using regular expressions for the reformatting (re.sub and replace) and also I've looked at elasticsearch bulk helper to post the data but I couldn't figure out a solution.

Any help is highly appreciated, thanks.

Hy!

According to https://elasticsearch-py.readthedocs.io/en/master/helpers.html#example , the python lib has a couple of helpers for bulk operation.

For example for your case, you could use the following code:

def gendata():
    docs = [{"field1": "x", "field2": "y"},{"field1": "w", "field2": "z"}]
    for doc in docs:
        yield {
            "_op_type":"index",
            "_index": "docs",
            "_type": "_doc",
            "doc": doc
        }

bulk(es, gendata())

Your current format is fine provided that you can load the list of dict in memory.

However, if you cannot load the entire file in memory then you may need to transform your file as new line separated JSON

{"field1": "x", "field2": "y"}
{"field1": "w", "field2": "z"}

and then you should read line by line and using the generator as @banuj suggested.

Another nice example can be found here: https://github.com/elastic/elasticsearch-py/blob/master/example/load.py#L76-L130

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM