简体   繁体   English

将 avro 文件批量索引到 elasticsearch

[英]Indexing avro file to elasticsearch in bulk

I wrote this short simple script我写了这个简短的简单脚本

from elasticsearch import Elasticsearch
from fastavro import reader

es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
    avro_reader = reader(fo)
    for record in avro_reader:
        es.index(index="my_index", body=record)

It works absolutely fine.它工作得很好。 Each record is a json and Elasticsearch can index json files.每条记录是一个 json 和 Elasticsearch 可以索引 json 文件。 But rather than going one by one in a for loop, is there a way to do this in bulk?但是,与其在 for 循环中逐个进行,有没有办法批量执行此操作? Because this is very slow.因为这非常慢。

There are 2 ways to do this.有两种方法可以做到这一点。

  1. Use Elasticsearch Bulk API and requests python使用 Elasticsearch 批量 API 并requests python
  2. Use Elasticsearch python library which internally calls the same bulk API使用 Elasticsearch python 库,它在内部调用相同的批量 API
    from elasticsearch import Elasticsearch
    from elasticsearch import helpers
    from fastavro import reader
    
    es = Elasticsearch(['someIP:somePort'])
    
    with open('data.avro', 'rb') as fo:
        avro_reader = reader(fo)
        records = [
            {
                "_index": "my_index",
                "_type": "record",
                "_id": j,
                "_source": record
            }
            for j,record in enumerate(avro_reader)
            ]
        helpers.bulk(es, records)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM