在Elasticsearch中重新索引时如何将对象数组转换为字符串数组？

Question

假设源索引有一个像这样的文档：

{
   "name":"John Doe",
   "sport":[
       {
          "name":"surf",
          "since":"2 years"
       },
       {
          "name":"mountainbike",
          "since":"4 years"
       },
   ]
}

如何丢弃“自”信息，以便一旦重新索引该对象将仅包含运动名称？ 像这样：

{
   "name":"John Doe",
   "sport":["surf","mountainbike"]
}

请注意，如果结果字段保持相同的名称会很好，但这不是强制性的。

Answer 1

我不知道您使用的是哪个版本的elasticsearch，但这是基于管道的解决方案，是ES v5.0中引入节点引入的。

1） script处理器用于从每个子对象中提取值并将其设置在另一个字段（此处为sports ）中
2）使用remove处理器删除了先前的sport

您可以使用Simulate pipeline API进行测试：

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "random description",
    "processors": [
      {
        "script": {
          "lang": "painless",
          "source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name)  }"
        }
      },
      {
        "remove": {
          "field": "sport"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "doc",
      "_id": "id",
      "_source": {
        "name": "John Doe",
        "sport": [
          {
            "name": "surf",
            "since": "2 years"
          },
          {
            "name": "mountainbike",
            "since": "4 years"
          }
        ]
      }
    }
  ]
}

输出以下结果：

{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_type": "doc",
        "_id": "id",
        "_source": {
          "name": "John Doe",
          "sports": [
            "surf",
            "mountainbike"
          ]
        },
        "_ingest": {
          "timestamp": "2018-07-12T14:07:25.495Z"
        }
      }
    }
  ]
}

可能有更好的解决方案，因为我没有太多使用管道，或者您可以在将文档提交到Elasticsearch集群之前使用Logstash过滤器进行此操作。

有关管道的更多信息，请参阅摄取节点的参考文档。

在Elasticsearch中重新索引时如何将对象数组转换为字符串数组？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-12 14:13:53

在Elasticsearch中重新索引时如何将对象数组转换为字符串数组？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-12 14:13:53

解决方案1
2 已采纳 2018-07-12 14:13:53