将 avro 文件批量索引到 elasticsearch

Question

I wrote this short simple script我写了这个简短的简单脚本

from elasticsearch import Elasticsearch
from fastavro import reader

es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
    avro_reader = reader(fo)
    for record in avro_reader:
        es.index(index="my_index", body=record)

It works absolutely fine.它工作得很好。 Each record is a json and Elasticsearch can index json files.每条记录是一个 json 和 Elasticsearch 可以索引 json 文件。 But rather than going one by one in a for loop, is there a way to do this in bulk?但是，与其在 for 循环中逐个进行，有没有办法批量执行此操作？ Because this is very slow.因为这非常慢。

Answer 1

There are 2 ways to do this.有两种方法可以做到这一点。

Use Elasticsearch Bulk API and requests python使用 Elasticsearch 批量 API 并requests python
Use Elasticsearch python library which internally calls the same bulk API使用 Elasticsearch python 库，它在内部调用相同的批量 API

    from elasticsearch import Elasticsearch
    from elasticsearch import helpers
    from fastavro import reader
    
    es = Elasticsearch(['someIP:somePort'])
    
    with open('data.avro', 'rb') as fo:
        avro_reader = reader(fo)
        records = [
            {
                "_index": "my_index",
                "_type": "record",
                "_id": j,
                "_source": record
            }
            for j,record in enumerate(avro_reader)
            ]
        helpers.bulk(es, records)

将 avro 文件批量索引到 elasticsearch

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-07-05 11:14:10

将 avro 文件批量索引到 elasticsearch

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-07-05 11:14:10

解决方案1
0 已采纳 2020-07-05 11:14:10