[英]How to turn an array of object to array of string while reindexing in elasticsearch?
假设源索引有一个像这样的文档:
{
"name":"John Doe",
"sport":[
{
"name":"surf",
"since":"2 years"
},
{
"name":"mountainbike",
"since":"4 years"
},
]
}
如何丢弃“自”信息,以便一旦重新索引该对象将仅包含运动名称? 像这样 :
{
"name":"John Doe",
"sport":["surf","mountainbike"]
}
请注意,如果结果字段保持相同的名称会很好,但这不是强制性的。
我不知道您使用的是哪个版本的elasticsearch,但这是基于管道的解决方案,是ES v5.0中引入节点引入的。
script
处理器用于从每个子对象中提取值并将其设置在另一个字段(此处为sports
)中 remove
处理器删除了先前的sport
您可以使用Simulate pipeline API
进行测试:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "random description",
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name) }"
}
},
{
"remove": {
"field": "sport"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sport": [
{
"name": "surf",
"since": "2 years"
},
{
"name": "mountainbike",
"since": "4 years"
}
]
}
}
]
}
输出以下结果:
{
"docs": [
{
"doc": {
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sports": [
"surf",
"mountainbike"
]
},
"_ingest": {
"timestamp": "2018-07-12T14:07:25.495Z"
}
}
}
]
}
可能有更好的解决方案,因为我没有太多使用管道,或者您可以在将文档提交到Elasticsearch集群之前使用Logstash过滤器进行此操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.