简体   繁体   English

使用Python API通过映射将CSV加载到Elasticsearch索引

[英]Loading CSV to elasticsearch index with mapping using Python API

Using the elasticsearch Python API I want to create an elasticsearch index with a mapping so that when I upload a CSV file the documents are uploaded according to this mapping. 我想使用elasticsearch Python API创建一个带有映射的elasticsearch索引,以便当我上传CSV文件时,根据此映射上传文档。

import argparse, elasticsearch, json
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import csv

I have this (I removed some fields so the mapping doesn't look that long): 我有这个(我删除了一些字段,所以映射看起来不会那么长):

mapping = 
'''{
"mappings": {
  "type": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "@version": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "authEndStopCode": {
        "type": "keyword"
      },
      "expandedTripNumber": {
        "type": "integer"
      },
      "operator": {
        "type": "integer"
      },
      "path": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "startStopName": {
        "type": "keyword"
      },
      "userStartStopCode": {
        "type": "keyword"
      }
    }
  }
}
}'''

I'm creating the index this way: 我以这种方式创建索引:

es.indices.create(index=INDEX_NAME, ignore=400, body=mapping)

This is what I do to upload the data: 这是我要上传的数据:

with open(args.file, "r", encoding="latin-1") as f:
    reader = csv.DictReader(f)
    bulk(es, reader, index=INDEX_NAME, doc_type=TYPE)

Where INDEX_NAME and TYPE are strings I already defined. 其中INDEX_NAMETYPE是我已经定义的字符串。

The CSV file is just data (it should be one document per line), doesn't have headers, but elasticsearch seems like it's trying to use the first line as the headers. CSV文件只是数据(每行应该是一个文档),没有标题,但是elasticsearch似乎正在尝试将第一行用作标题。 I don't want this, I want to use the mapping I already added to the index. 我不想要这个,我想使用已经添加到索引中的映射。

Hope someone can help. 希望有人能帮忙。 Thank you. 谢谢。

The problem wasn't bulk. 问题不大。 csv.DictReader always reads the first line from the file to get the headers for subsequent rows. csv.DictReader始终从文件中读取第一行,以获取后续行的标题。 So if you're going to use DictReader , the file needs a header. 因此,如果您要使用DictReader ,则该文件需要一个标头。

I'm the author of moshe/elasticsearch_loader 我是moshe / elasticsearch_loader的作者
I wrote ESL for this exact problem. 我为这个确切的问题写了ESL。
You can download it with pip: 您可以通过pip下载它:

pip install elasticsearch-loader

And then you will be able to load csv files into elasticsearch while supplying your custom mapping by issuing: 然后,您可以通过发出以下命令在提供自定义映射的同时将csv文件加载到elasticsearch中:

elasticsearch_loader  --index-settings-file mappings.json \
     --index incidents --type incident csv file1.csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM