简体   繁体   English

使用 Kafka Connect 更新现有文档上的 Elasticsearch 字段,而不是创建新的

[英]Use Kafka Connect to update Elasticsearch field on existing document instead of creating new

I have Kafka set-up running with the Elasticsearch connector and I am successfully indexing new documents into an ES index based on the incoming messages on a particular topic.我有使用 Elasticsearch 连接器运行的 Kafka 设置,并且我成功地将新文档索引到基于特定主题的传入消息的 ES 索引中。

However, based on incoming messages on another topic, I need to append data to a field on a specific document in the same index.但是,基于另一个主题的传入消息,我需要将 append 数据放到同一索引中特定文档的某个字段中。

Psuedo-schema below:下面的伪模式:

{
   "_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
   "uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
   "title": "A title",
   "body": "A body",
   "created_at": 164584548,
   "views": []
}

^ This document is being created fine in ES based on the data in the topic mentioned above. ^ 根据上述主题中的数据,此文档正在 ES 中正常创建。

However, how do I then add items to the views field using messages from another topic.但是,我如何使用来自另一个主题的消息将项目添加到views字段。 Like so:像这样:

article-view topic schema: article-view主题架构:

{
   "article_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
   "user_id": 123456,
   "timestamp: 136389734
}

and instead of simply creating a new document on the article-view index (which I dont' want to even have).而不是简单地在article-view索引上创建一个新文档(我什至不想拥有)。 It appends this to the views field on the article document with corresponding _id equal to article_id from the message.它将这个附加到文章文档的views字段中,对应的_id等于消息中的article_id

so the end result after one message would be:所以一条消息后的最终结果是:

{
   "_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
   "uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
   "title": "A title",
   "body": "A body",
   "created_at": 164584548,
   "views": [
       {
           "user_id": 123456,
           "timestamp: 136389734
       }
   ]
}

Using the ES API it is possible using a script.使用 ES API 可以使用脚本。 Like so:像这样:

{
    "script": {
        "lang": "painless",
        "params": {
            "newItems": [{
                "timestamp": 136389734,
                "user_id": 123456
            }]
        },
        "source": "ctx._source.views.addAll(params.newItems)"
    }
}

I can generate scripts like above dynamically in bulk, and then use the helpers.bulk function in the ES Python library to bulk update documents this way.我可以像上面那样动态地批量生成脚本,然后使用 ES Python 库中的helpers.bulk function 来批量更新文档。

Is this possible with Kafka Connect / Elasticsearch? Kafka Connect / Elasticsearch 可以做到这一点吗? I haven't found any documentation on Confluent's website to explain how to do this.我没有在 Confluent 的网站上找到任何文档来解释如何做到这一点。

It seems like a fairly standard requirement and an obvious thing people would need to do with Kafka / A sink connector like ES.这似乎是一个相当标准的要求,也是人们需要使用 Kafka / 像 ES 这样的接收器连接器做的一件显而易见的事情。

Thanks!谢谢!

The Elasticsearch connector doesn't support this. Elasticsearch 连接器不支持此功能。 You can update documents in-place but need to send the full document, not a delta for appending which I think it what you're after.您可以就地更新文档,但需要发送完整的文档,而不是附加的增量,我认为这是您所追求的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM