简体   繁体   English

如果在Elasticsearch中使用批量处理器不存在索引

[英]Index if not exists using bulk processor in elasticsearch

I am trying to index a document if it doesn't already exist in elasticsearch. 我正在尝试为Elasticsearch中尚不存在的文档建立索引。 I am using BulkProcessor when indexing my documents and using Requests.add action. 我在为文档建立索引并使用Requests.add操作时使用BulkProcessor I will have the exact same id sometimes, does it not add automatically, but update? 有时我会拥有完全相同的ID,它不会自动添加,但会更新吗?

PS Update is not a requirement, it can stay as is. PS Update不是必需的,它可以保持原样。

PS2 I am trying to integrate a user's past tweets into elasticsearch-twitter-river 's user stream. PS2我正在尝试将用户过去的推文集成到elasticsearch-twitter-river的用户流中。

If you index a doc with the same document id then it will do an update. 如果您为具有相同文档ID的文档建立索引,则它将进行更新。 Otherwise it will add a new document. 否则它将添加一个新文档。

In other words, if you PUT a doc to {index}/{type}/{id} , then it will always update (overwrite) the document with that id. 换句话说,如果您将文档PUT {index}/{type}/{id} ,则它将始终使用该ID更新(覆盖)文档。 If you POST a doc to {index}/{type} then in general Elasticsearch will generate a new document for each of your POSTs. 如果你POST一个文档,以{index}/{type}然后在一般Elasticsearch会为每个帖子一个新的文档。 That is, unless you mapped a document field to the _id field in mappings . 也就是说,除非您在mappings中将文档字段映射到_id字段

It seems that the Twitter River uses the PUT method with explicitly specifying the id so tweets with the same id will probably be overwritten. 似乎Twitter River使用PUT方法明确指定了ID,因此具有相同ID的推文可能会被覆盖。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM