使用批量 api elasticsearch kibana 更改 csv 列的字段格式

Question

I want to change the type of one of the columns of my.csv file that I import via bulk api in elastic search in python.我想在 python 的弹性搜索中更改我通过批量 api 导入的 my.csv 文件的其中一列的类型。 The column contains dates but is imported as a string (however, when I upload the file manually in kibana, it takes it in date format).该列包含日期，但作为字符串导入（但是，当我在 kibana 中手动上传文件时，它采用日期格式）。

es = Elasticsearch()
    with open('user.csv') as f:
        reader = csv.DictReader(f)
        helpers.bulk(es, reader, index='user', doc_type='my-type')

I already tried mapping but it doesn't work:我已经尝试过映射，但它不起作用：

mapping = {
  "mappings": {
    "my-type": {
      "properties": {
        "('affiliation',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('banned',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('bracket',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('country',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('created',)": {
          "type": "date",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('email',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('hidden',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('id',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('name',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('oauth_id',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('password',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('promotion',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('school',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('secret',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('speciality',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('type',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('verified',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('website',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

    es.indices.create(index='user', ignore=400, body=mapping)
    with open('user.csv') as f:
        reader = csv.DictReader(f)
        helpers.bulk(es, reader, index='user', doc_type='csv')

Do you have any ideas or solutions?你有什么想法或解决方案吗？ Thanks a lot !非常感谢！

Answer 1

The doc types need to be consistent in order for the correct mapping to be applied.文档类型需要保持一致才能应用正确的映射。 Your first vs second call:您的第一个与第二个电话：

helpers.bulk(es, reader, index='user', doc_type= 'my-type' ) helpers.bulk(es, reader, index='user', doc_type= 'my-type' )

helpers.bulk(es, reader, index='user', doc_type= 'csv' ) helpers.bulk(es, reader, index='user', doc_type= 'csv' )

If your mapping configures 'my-type' , reference it as such in all subsequent function calls.如果您的映射配置'my-type' ，请在所有后续 function 调用中引用它。

But more importantly, reading from a CSV doesn't guarantee any original column types -- most of them will be read in as strings, As such.但更重要的是，从 CSV 读取并不能保证任何原始列类型——它们中的大多数将作为字符串读入，因此。 it's recommended to pre-process your docs' attributes to guarantee they'll be treated correctly -- ie, dates, numbers, booleans.建议对您的文档属性进行预处理，以保证它们会被正确处理——即日期、数字、布尔值。 etc.等等

In the function generateBulkPayload below you can parse/modify select values right before they're inserted into ES:在下面的 function generateBulkPayload中，您可以在将 select 值插入 ES 之前对其进行解析/修改：

import csv
from elasticsearch import Elasticsearch
from elasticsearch import helpers

es = Elasticsearch()

index_name = "user"
doc_type = "my-type"

mapping = {
    "mappings": {
        "my-type": {
            "created": {
                "type": "date",
                "format": "epoch_millis"  # assuming you're dealing with millisecond timestamps
            }
        }
    }
}

es.indices.create(index=index_name, ignore=400, body=mapping)


def generateBulkPayload(csv_reader):
    for row in csv_reader:
        # handle your parsing here
 
        # overwriting the `created` attribute
        row.update(dict(created=int(row.get('created'))))

        yield row


with open('user.csv') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es,
                 generateBulkPayload(reader),
                 index=index_name,
                 doc_type=doc_type)

Answer 2

This code compiles well but the date format is still not recognized by elasticsearch.此代码编译良好，但 elasticsearch 仍然无法识别日期格式。 What to do so that elasticsearch recognizes it?怎么做才能让 elasticsearch 识别？

def generateBulkPayload(csv_reader):
    for row in csv_reader:
        created=row.get("('created',)") # Base format : 2021-03-04 13:56:16.663801
        datetime = parser.parse(created)
        epoch= datetime.timestamp()

        row.update(dict(created=int(epoch)))

        yield row

使用批量 api elasticsearch kibana 更改 csv 列的字段格式

问题描述

2 个解决方案

解决方案1
1 2021-03-05 11:40:51

解决方案2
0 2021-03-05 14:44:22

使用批量 api elasticsearch kibana 更改 csv 列的字段格式

问题描述

2 个解决方案

解决方案1 1 2021-03-05 11:40:51

解决方案2 0 2021-03-05 14:44:22

解决方案1
1 2021-03-05 11:40:51

解决方案2
0 2021-03-05 14:44:22