简体繁体 English

如果在Elasticsearch中使用批量处理器不存在索引

[英]Index if not exists using bulk processor in elasticsearch

原文 2015-05-03 22:13:53 2 1 twitter/ elasticsearch/ twitter4j/ twitter-streaming-api

I am trying to index a document if it doesn't already exist in elasticsearch. 我正在尝试为Elasticsearch中尚不存在的文档建立索引。 I am using BulkProcessor when indexing my documents and using Requests.add action. 我在为文档建立索引并使用Requests.add操作时使用BulkProcessor 。 I will have the exact same id sometimes, does it not add automatically, but update? 有时我会拥有完全相同的ID，它不会自动添加，但会更新吗？

PS Update is not a requirement, it can stay as is. PS Update不是必需的，它可以保持原样。

PS2 I am trying to integrate a user's past tweets into elasticsearch-twitter-river 's user stream. PS2我正在尝试将用户过去的推文集成到elasticsearch-twitter-river的用户流中。

1 个解决方案

If you index a doc with the same document id then it will do an update. 如果您为具有相同文档ID的文档建立索引，则它将进行更新。 Otherwise it will add a new document. 否则它将添加一个新文档。

In other words, if you PUT a doc to {index}/{type}/{id} , then it will always update (overwrite) the document with that id. 换句话说，如果您将文档PUT {index}/{type}/{id} ，则它将始终使用该ID更新（覆盖）文档。 If you POST a doc to {index}/{type} then in general Elasticsearch will generate a new document for each of your POSTs. 如果你POST一个文档，以{index}/{type}然后在一般Elasticsearch会为每个帖子一个新的文档。 That is, unless you mapped a document field to the _id field in mappings . 也就是说，除非您在mappings中将文档字段映射到_id字段。

It seems that the Twitter River uses the PUT method with explicitly specifying the id so tweets with the same id will probably be overwritten. 似乎Twitter River使用PUT方法明确指定了ID，因此具有相同ID的推文可能会被覆盖。

使用RavenDB批量插入数据 - using RavenDB for Bulk inserts of data

使用带有jQuery的ajax检查Twitter用户名是否存在 - Check if a twitter username exists using ajax with jquery

将 twitter 数据索引到 elasticsearch：已超出索引中总字段 [1000] 的限制 - indexing twitter data into elasticsearch: Limit of total fields [1000] in index has been exceeded

批量插入Mongo-红宝石 - Bulk Insert into Mongo - Ruby

有没有一种方法可以在sttwitter中获得批量推文？ - Is there a method that gets bulk tweets in sttwitter?

Apache nifi getTwitter 处理器返回 403 禁止 - Apache nifi getTwitter Processor returning 403 forbidden

Spark流式弹性搜索依赖关系 - Spark streaming elasticsearch dependencies

使用Tweepy获取推文时出现“ IndexError：列表索引超出范围” - “IndexError: list index out of range” while fetching tweets using Tweepy

从 Twitter API 使用时未定义的索引错误 - Undefined index error when using from Twitter API

检查推文状态是否存在？ - Check if tweet status exists?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用RavenDB批量插入数据 - using RavenDB for Bulk inserts of data 使用带有jQuery的ajax检查Twitter用户名是否存在 - Check if a twitter username exists using ajax with jquery 将 twitter 数据索引到 elasticsearch：已超出索引中总字段 [1000] 的限制 - indexing twitter data into elasticsearch: Limit of total fields [1000] in index has been exceeded 批量插入Mongo-红宝石 - Bulk Insert into Mongo - Ruby 有没有一种方法可以在sttwitter中获得批量推文？ - Is there a method that gets bulk tweets in sttwitter? Apache nifi getTwitter 处理器返回 403 禁止 - Apache nifi getTwitter Processor returning 403 forbidden Spark流式弹性搜索依赖关系 - Spark streaming elasticsearch dependencies 使用Tweepy获取推文时出现“ IndexError：列表索引超出范围” - “IndexError: list index out of range” while fetching tweets using Tweepy 从 Twitter API 使用时未定义的索引错误 - Undefined index error when using from Twitter API 检查推文状态是否存在？ - Check if tweet status exists?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM