[英]Change field format for csv column using bulk api elasticsearch kibana
I want to change the type of one of the columns of my.csv file that I import via bulk api in elastic search in python.我想在 python 的弹性搜索中更改我通过批量 api 导入的 my.csv 文件的其中一列的类型。 The column contains dates but is imported as a string (however, when I upload the file manually in kibana, it takes it in date format).该列包含日期,但作为字符串导入(但是,当我在 kibana 中手动上传文件时,它采用日期格式)。
es = Elasticsearch()
with open('user.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index='user', doc_type='my-type')
I already tried mapping but it doesn't work:我已经尝试过映射,但它不起作用:
mapping = {
"mappings": {
"my-type": {
"properties": {
"('affiliation',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('banned',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('bracket',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('country',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('created',)": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('email',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('hidden',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('id',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('name',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('oauth_id',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('password',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('promotion',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('school',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('secret',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('speciality',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('type',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('verified',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"('website',)": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
es.indices.create(index='user', ignore=400, body=mapping)
with open('user.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es, reader, index='user', doc_type='csv')
Do you have any ideas or solutions?你有什么想法或解决方案吗? Thanks a lot !非常感谢 !
The doc types need to be consistent in order for the correct mapping to be applied.文档类型需要保持一致才能应用正确的映射。 Your first vs second call:您的第一个与第二个电话:
helpers.bulk(es, reader, index='user', doc_type= 'my-type' ) helpers.bulk(es, reader, index='user', doc_type= 'my-type' )
helpers.bulk(es, reader, index='user', doc_type= 'csv' ) helpers.bulk(es, reader, index='user', doc_type= 'csv' )
If your mapping configures 'my-type'
, reference it as such in all subsequent function calls.如果您的映射配置'my-type'
,请在所有后续 function 调用中引用它。
But more importantly, reading from a CSV doesn't guarantee any original column types -- most of them will be read in as strings, As such.但更重要的是,从 CSV 读取并不能保证任何原始列类型——它们中的大多数将作为字符串读入,因此。 it's recommended to pre-process your docs' attributes to guarantee they'll be treated correctly -- ie, dates, numbers, booleans.建议对您的文档属性进行预处理,以保证它们会被正确处理——即日期、数字、布尔值。 etc.等等
In the function generateBulkPayload
below you can parse/modify select values right before they're inserted into ES:在下面的 function generateBulkPayload
中,您可以在将 select 值插入 ES 之前对其进行解析/修改:
import csv
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
index_name = "user"
doc_type = "my-type"
mapping = {
"mappings": {
"my-type": {
"created": {
"type": "date",
"format": "epoch_millis" # assuming you're dealing with millisecond timestamps
}
}
}
}
es.indices.create(index=index_name, ignore=400, body=mapping)
def generateBulkPayload(csv_reader):
for row in csv_reader:
# handle your parsing here
# overwriting the `created` attribute
row.update(dict(created=int(row.get('created'))))
yield row
with open('user.csv') as f:
reader = csv.DictReader(f)
helpers.bulk(es,
generateBulkPayload(reader),
index=index_name,
doc_type=doc_type)
This code compiles well but the date format is still not recognized by elasticsearch.此代码编译良好,但 elasticsearch 仍然无法识别日期格式。 What to do so that elasticsearch recognizes it?怎么做才能让 elasticsearch 识别?
def generateBulkPayload(csv_reader):
for row in csv_reader:
created=row.get("('created',)") # Base format : 2021-03-04 13:56:16.663801
datetime = parser.parse(created)
epoch= datetime.timestamp()
row.update(dict(created=int(epoch)))
yield row
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.