在 Python 中将 CSV 索引到 ElasticSearch

Question

Looking to index a CSV file to ElasticSearch, without using Logstash.希望在不使用 Logstash 的情况下将 CSV 文件索引到 ElasticSearch。 I am using the elasticsearch-dsl high level library.我正在使用elasticsearch-dsl高级库。

Given a CSV with header for example:给定一个带有标题的 CSV，例如：

name,address,url
adam,hills 32,http://rockit.com
jane,valleys 23,http://popit.com

What will be the best way to index all the data by the fields?按字段索引所有数据的最佳方法是什么？ Eventually I'm looking to get each row to look like this最终我希望让每一行看起来像这样

{
"name": "adam",
"address": "hills 32",
"url":  "http://rockit.com"
}

Answer 1

This kind of task is easier with the lower-level elasticsearch-py library:使用较低级别的elasticsearch-py库可以更轻松地完成此类任务：

from elasticsearch import helpers, Elasticsearch
import csv

es = Elasticsearch()

with open('/tmp/x.csv') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es, reader, index='my-index', doc_type='my-type')

Answer 2

If you want to create elasticsearch database from .tsv/.csv with strict types and model for a better filtering u can do something like that :如果您想从.tsv/.csv创建具有严格类型和模型的elasticsearch数据库以进行更好的过滤，您可以执行以下操作：

class ElementIndex(DocType):
    ROWNAME = Text()
    ROWNAME = Text()

    class Meta:
        index = 'index_name'

def indexing(self):
    obj = ElementIndex(
        ROWNAME=str(self['NAME']),
        ROWNAME=str(self['NAME'])
    )
    obj.save(index="index_name")
    return obj.to_dict(include_meta=True)

def bulk_indexing(args):

    # ElementIndex.init(index="index_name")
    ElementIndex.init()
    es = Elasticsearch()

    //here your result dict with data from source

    r = bulk(client=es, actions=(indexing(c) for c in result))
    es.indices.refresh()

在 Python 中将 CSV 索引到 ElasticSearch

问题描述

2 个解决方案

解决方案1
41 已采纳 2017-01-11 13:44:53

解决方案2
1 2017-07-11 10:45:10

在 Python 中将 CSV 索引到 ElasticSearch

问题描述

2 个解决方案

解决方案1 41 已采纳 2017-01-11 13:44:53

解决方案2 1 2017-07-11 10:45:10

解决方案1
41 已采纳 2017-01-11 13:44:53

解决方案2
1 2017-07-11 10:45:10