I have a huge amount of data in a single JSON that I want to get it into Elasticsearch to do some visualizations in Kibana. My JSON currently looks somewhat like this:
[{"field1": "x", "field2": "y"},
{"field1": "w", "field2": "z"}]
...etc
After doing some research, I found that the best way to feed this data to Elasticsearch is using the Bulk API, but first I need to reformat my data to look like this:
{"index":{"_index": "myindex", "type": "entity_type", "_id": 1}}
{"field1": "x", "field2": "y"}
{"index":{"_index": "myindex", "type": "entity_type", "_id": 2}}
{"field1": "w", "field2": "z"}
...etc
And then I have to post this file using curl.
All of this is part of a bigger Python project so I would like to know the best way to do the reformatting of my data and how to get it into Elasticsearch using Python. I've thought of using regular expressions for the reformatting (re.sub and replace) and also I've looked at elasticsearch bulk helper to post the data but I couldn't figure out a solution.
Any help is highly appreciated, thanks.
Hy!
According to https://elasticsearch-py.readthedocs.io/en/master/helpers.html#example , the python lib has a couple of helpers for bulk
operation.
For example for your case, you could use the following code:
def gendata():
docs = [{"field1": "x", "field2": "y"},{"field1": "w", "field2": "z"}]
for doc in docs:
yield {
"_op_type":"index",
"_index": "docs",
"_type": "_doc",
"doc": doc
}
bulk(es, gendata())
Your current format is fine provided that you can load the list of dict in memory.
However, if you cannot load the entire file in memory then you may need to transform your file as new line separated JSON
{"field1": "x", "field2": "y"}
{"field1": "w", "field2": "z"}
and then you should read line by line and using the generator as @banuj suggested.
Another nice example can be found here: https://github.com/elastic/elasticsearch-py/blob/master/example/load.py#L76-L130
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.