使用logstash在弹性搜索中将两个索引组合成第三个索引

Question

I have two index我有两个索引

employee_data {"code":1, "name":xyz, "city":"Mumbai" } employee_data {"code":1, "name":xyz, "city":"Mumbai" }
transaction_data {"code":1, "Month":June", payment:78000 } transaction_data {"code":1, "Month":June", payment:78000 }

I want third index like this 3)join_index我想要这样的第三个索引 3)join_index

{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 } How it's possible?? {"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }这怎么可能？？

i am trying in logstash我正在尝试使用 logstash

input {
  elasticsearch {
    hosts => "localost"
    index => "employees_data,transaction_data"
   
     query => '{ "query": { "match": { "code": 1} } }'
    scroll => "5m"
    docinfo => true
  }
}
output {

elasticsearch { hosts => ["localhost"] elasticsearch { 主机 => [“本地主机”]

index => "join1"
   }

} }

Answer 1

You can use elasticsearch input on employees_data您可以在employees_data上使用 elasticsearch输入

In your filters, use the elasticsearch filter on transaction_data在您的过滤器中，对transaction_data使用 elasticsearch过滤器

input {
  elasticsearch {
    hosts => "localost"
    index => "employees_data"
   
     query => '{ "query": { "match_all": { } } }'
     sort => "code:desc"

    scroll => "5m"
    docinfo => true
  }
}
filter {
    elasticsearch {
              hosts => "localhost"
              index => "transaction_data"
              query => "(code:\"%{[code]}\"
              fields => { 
                    "Month" => "Month",
                    "payment" => "payment" 
                   }
        }
}
output {
  elasticsearch { 
    hosts => ["localhost"]
    index => "join1"
   }
}

And send your new document to your third index with the elasticsearch output并使用 elasticsearch output将您的新文档发送到您的第三个索引

You'll have 3 elastic search connection and the result can be a little slow.您将拥有 3 个弹性搜索连接，结果可能会有点慢。 But it works.但它有效。

Answer 2

You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor .您不需要 Logstash 来执行此操作，Elasticsearch 本身通过利用 enrich enrich processor来支持它。

First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):首先，您需要创建一个丰富策略（使用最小的索引，假设它是employees_data ）：

PUT /_enrich/policy/employee-policy
{
  "match": {
    "indices": "employees_data",
    "match_field": "code",
    "enrich_fields": ["name", "city"]
  }
}

Then you can execute that policy in order to create an enrichment index然后您可以执行该策略以创建浓缩索引

POST /_enrich/policy/employee-policy/_execute

When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:创建并填充丰富索引后，下一步需要您创建使用上述丰富策略/索引的摄取管道：

PUT /_ingest/pipeline/employee_lookup
{
  "description" : "Enriching transactions with employee data",
  "processors" : [
    {
      "enrich" : {
        "policy_name": "employee-policy",
        "field" : "code",
        "target_field": "tmp",
        "max_matches": "1"
      }
    },
    {
      "script": {
        "if": "ctx.tmp != null",
        "source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
      }
    }
  ]
}

Finally, you're now ready to create your target index with the joined data.最后，您现在已准备好使用连接的数据创建目标索引。 Simply leverage the _reindex API combined with the ingest pipeline we've just created:只需将_reindex API 与我们刚刚创建的摄取管道结合使用即可：

POST _reindex
{
  "source": {
    "index": "transaction_data"
  },
  "dest": {
    "index": "join1",
    "pipeline": "employee_lookup"
  }
}

After running this, the join1 index will contain exactly what you need, for instance:运行此命令后， join1索引将包含您所需要的内容，例如：

  {
    "_index" : "join1",
    "_type" : "_doc",
    "_id" : "0uA8dXMBU9tMsBeoajlw",
    "_score" : 1.0,
    "_source" : {
      "code":1, 
      "name": "xyz", 
      "city": "Mumbai", 
      "Month": "June", 
      "payment": 78000 
    }
  }

Answer 3

As long as I know, this can not be happened just using elasticsearch APIs.据我所知，仅使用 elasticsearch API 是不可能发生这种情况的。 To handle this, you need to set a unique ID for documents that are relevant.要处理此问题，您需要为相关文档设置唯一 ID。 For example, the code that you mentioned in your question can be a good ID for documents.例如，您在问题中提到的代码可以作为文档的良好 ID。 So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index.因此，您可以将第一个索引重新索引到第三个索引，并使用 UPDATE API 来更新它们，方法是从第二个索引中读取文档并将它们的 ID 更新到第三个索引中。 I hope I could help.我希望我能提供帮助。

使用logstash在弹性搜索中将两个索引组合成第三个索引

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-09-24 11:09:49

解决方案2
1 2020-09-24 11:30:33

解决方案3
0 2020-09-24 07:38:27

使用logstash在弹性搜索中将两个索引组合成第三个索引

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-09-24 11:09:49

解决方案2 1 2020-09-24 11:30:33

解决方案3 0 2020-09-24 07:38:27

解决方案1
1 已采纳 2020-09-24 11:09:49

解决方案2
1 2020-09-24 11:30:33

解决方案3
0 2020-09-24 07:38:27