简体   繁体   English

Elasticsearch - 创建索引并放入 csv 文件的数据

[英]Elasticsearch -Creating index and putting data of csv file

I have a CSV file我有一个 CSV 文件CSV数据

I want to create separate index for each category of animal eg.我想为每个类别的动物创建单独的索引,例如。 all dogs should be in 1 index named 'dogs', cats should be in another index named 'cats' etc. While parsing the data if there is already an index of the animal then I add the entry in that index, else I want to create a separate index and add that entry to that index.所有的狗都应该在一个名为“dogs”的索引中,猫应该在另一个名为“cats”的索引中,等等。在解析数据时,如果已经有动物的索引,那么我会在该索引中添加条目,否则我想创建一个单独的索引并将该条目添加到该索引。 At the end I should have 3 index with the following data: Dog - Jerry, Thera Cat - Lily, Melo Rabbit - Bunny最后我应该有 3 个索引,其中包含以下数据:Dog - Jerry,Thera Cat - Lily,Melo Rabbit - Bunny

I want to know how can this be done using python. I am trying but not able to parse the csv and not able to create new index for each category.我想知道如何使用 python 完成此操作。我正在尝试但无法解析 csv 并且无法为每个类别创建新索引。

According to my solution, you can just parse the CSV file in python and use ingest pipeline while indexing the documents.根据我的解决方案,您可以只解析 python 中的 CSV 文件,并在索引文档时使用摄取管道。

To read the CSV file, please check the following script and insert the data to Elasticsearch with the pipeline parameter:要读取 CSV 文件,请检查以下脚本并使用管道参数将数据插入到 Elasticsearch:

from elasticsearch import Elasticsearch, helpers
from csv import reader

es = Elasticsearch(host = "localhost", port = 9200)

with open('stockerbot-export.csv') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es, reader, index='someindexname', pipeline='index-name-change-pipeline')

Before executing the command, you need to create a pipeline for our logic with the following request on kibana of Elasticsearch:在执行命令之前,您需要为我们的逻辑创建一个管道,在 Elasticsearch 的 kibana 上发出以下请求:

PUT _ingest/pipeline/index-name-change-pipeline
{
  "processors": [
    {
      "set": {
        "field": "_index",
        "value": "someprefix-{{{animal}}}"
      }
    }
  ]
}

You can use some request client instead of kibana also.您也可以使用一些请求客户端而不是 kibana。 I could not test python code end to end, but I see the result with the following request on Kibana screen:我无法端到端地测试 python 代码,但我在 Kibana 屏幕上看到了以下请求的结果:

POST test/_doc?pipeline=index-name-change-pipeline
{
  "id": 1,
  "animal": "dog",
  "name": "asb"
}

As you see there, the index name is test while indexing the data.如您所见,在索引数据时索引名称是test But after the indexing is successful, the response will be this:但是索引成功后,响应将是这样的:

{
  "_index" : "someprefix-dog",
  "_id" : "5HEJyIAB4KMK-wcc6Rtb",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

So, our pipeline is working correctly as you see _index metadata.因此,正如您看到_index元数据一样,我们的管道工作正常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM