繁体   English   中英

使用批量 api elasticsearch kibana 更改 csv 列的字段格式

[英]Change field format for csv column using bulk api elasticsearch kibana

我想在 python 的弹性搜索中更改我通过批量 api 导入的 my.csv 文件的其中一列的类型。 该列包含日期,但作为字符串导入(但是,当我在 kibana 中手动上传文件时,它采用日期格式)。

es = Elasticsearch()
    with open('user.csv') as f:
        reader = csv.DictReader(f)
        helpers.bulk(es, reader, index='user', doc_type='my-type')

我已经尝试过映射,但它不起作用:

mapping = {
  "mappings": {
    "my-type": {
      "properties": {
        "('affiliation',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('banned',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('bracket',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('country',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('created',)": {
          "type": "date",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('email',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('hidden',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('id',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('name',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('oauth_id',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('password',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('promotion',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('school',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('secret',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('speciality',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('type',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('verified',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "('website',)": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
    es.indices.create(index='user', ignore=400, body=mapping)
    with open('user.csv') as f:
        reader = csv.DictReader(f)
        helpers.bulk(es, reader, index='user', doc_type='csv')

你有什么想法或解决方案吗? 非常感谢 !

文档类型需要保持一致才能应用正确的映射。 您的第一个与第二个电话:

helpers.bulk(es, reader, index='user', doc_type= 'my-type' )

helpers.bulk(es, reader, index='user', doc_type= 'csv' )

如果您的映射配置'my-type' ,请在所有后续 function 调用中引用它。

但更重要的是,从 CSV 读取并不能保证任何原始列类型——它们中的大多数将作为字符串读入,因此。 建议对您的文档属性进行预处理,以保证它们会被正确处理——即日期、数字、布尔值。 等等

在下面的 function generateBulkPayload中,您可以在将 select 值插入 ES 之前对其进行解析/修改:

import csv
from elasticsearch import Elasticsearch
from elasticsearch import helpers

es = Elasticsearch()

index_name = "user"
doc_type = "my-type"

mapping = {
    "mappings": {
        "my-type": {
            "created": {
                "type": "date",
                "format": "epoch_millis"  # assuming you're dealing with millisecond timestamps
            }
        }
    }
}

es.indices.create(index=index_name, ignore=400, body=mapping)


def generateBulkPayload(csv_reader):
    for row in csv_reader:
        # handle your parsing here
 
        # overwriting the `created` attribute
        row.update(dict(created=int(row.get('created'))))

        yield row


with open('user.csv') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es,
                 generateBulkPayload(reader),
                 index=index_name,
                 doc_type=doc_type)

此代码编译良好,但 elasticsearch 仍然无法识别日期格式。 怎么做才能让 elasticsearch 识别?

def generateBulkPayload(csv_reader):
    for row in csv_reader:
        created=row.get("('created',)") # Base format : 2021-03-04 13:56:16.663801
        datetime = parser.parse(created)
        epoch= datetime.timestamp()

        row.update(dict(created=int(epoch)))

        yield row

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM