简体   繁体   English

在 Python 中使用 Elasticsearch 进行批量更新

[英]Bulk Update with Elasticsearch in Python

I was successfully pushing my docs to my Elasticsearch and already checked it in my Kibana.我成功地将我的文档推送到我的 Elasticsearch 并且已经在我的 Kibana 中检查了它。

My current code is look like this:我当前的代码是这样的:

try:
    res = helpers.bulk(es, my_function(df))
    print("Working")
except Exception as e:
    print(e)

and this is "my_function" code:这是“my_function”代码:

def my_function(df):
    for c, line in enumerate(df):
        yield {
            '_index': 'my_index',
            '_type': '_doc',
            '_id': line.get("_id", None),
            '_source': {
                'field_A': line.get('field', "")
            }
        }
    raise StopIteration

Then I wonder what if I run the Python script again in the future to push just some new docs to Elasticsearch.然后我想知道如果我将来再次运行 Python 脚本以将一些新文档推送到 Elasticsearch 会怎样。 Does anyone have an idea on how to do this?有没有人知道如何做到这一点?

It only depends on the _id you're sending, if it's the same, then there won't be any duplicate, the new version will override the old version.这仅取决于您发送的_id ,如果相同,则不会有任何重复,新版本将覆盖旧版本。

So not much to worry about, the new documents that don't exist yet, will be indexed and the updates of existing documents will override their older version, provided they are sent with the same _id .所以不用担心,尚不存在的新文档将被编入索引,并且现有文档的更新将覆盖其旧版本,前提是它们以相同的_id发送。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM