简体   繁体   English

索引更新如何适用于 Solr 和 Elasticsearch?

[英]How the index updates works for Solr and Elasticsearch?

I have an application that is using Event Sourcing and Command Query Responsiblity Seggragation Pattern.我有一个使用事件溯源和命令查询责任隔离模式的应用程序。 Development of the Command part is complete and I have to decide how should I implement the Query part.命令部分的开发已经完成,我必须决定如何实现查询部分。

My system deals with customer orders, so when event arrives for an order, that order processed with orderId and order payload.我的系统处理客户订单,因此当订单事件到达时,该订单会使用 orderId 和订单有效负载进行处理。 The thing is, in this form only whay to query the orders is over orderId so I can't ask a question like give me all the order in the system with status OPEN.问题是,在这种形式下,查询订单的唯一方式是通过 orderId,所以我不能问一个问题,比如给我系统中状态为 OPEN 的所有订单。

For this part I have to use the query part, my potential technology implementations for the query part, a classical solution like PostGre DB or more elegant way in my opinion Solr/Elasticsearch.对于这部分,我必须使用查询部分,我对查询部分的潜在技术实现,一个像 PostGre DB 这样的经典解决方案,或者在我看来 Solr/Elasticsearch 更优雅的方式。

I have a basic knowledge/experience about Solr/Elasticsearch and I want to use this opurtunity to learn more but here comes my dilema.我对 Solr/Elasticsearch 有基本的了解/经验,我想利用这个机会学习更多,但我的困境来了。 Some other department in our company is already working with Elasticsearch and a colleage from that deperatment told me, updates in elasticsearch is not a good idea, I didn't quite understand his argumentation, so I like to ask here what I am planning to do so you can tell me, it is a bad idea or Solr is better suited for it.我们公司的一些其他部门已经在与 Elasticsearch 合作,那个 deperatment 的同事告诉我,更新 elasticsearch 不是一个好主意,我不太理解他的论点,所以我想在这里问我打算做什么所以你可以告诉我,这是个坏主意,或者 Solr 更适合它。

I am planning every status change for my order to send as an update for Elasticsearch, so it will look like the following.我计划将我的订单的每个状态更改作为 Elasticsearch 的更新发送,因此它看起来如下所示。

id ID Status地位 Customer顾客 Items项目
orderId1 orderId1 -> -> order.SUBMITTED提交订单 order.Customer订单.客户 order.Items订单.项目
orderId1 orderId1 -> -> order.CHANGED订单已更改 order.Customer1订单.Customer1 order.Items订单.项目
orderId1 orderId1 -> -> order.PROCESSING订单处理 order.Customer1订单.Customer1 order.Items订单.项目
orderId1 orderId1 -> -> order.ON_DELIVERY订单.ON_DELIVERY order.Customer1订单.Customer1 order.Items订单.项目
orderId1 orderId1 -> -> order.COMPLETE订单完成 order.Customer1订单.Customer1 order.Items订单.项目

As you see, I have to send several updates for orderId, to Elasticsearch/Solr.如您所见,我必须将 orderId 的多个更新发送到 Elasticsearch/Solr。

So my colleague told me, Indexed Documents in Elasticsearch are immutables, when I send order.SUBMITTED Event to be indexed, it will create the document but order.CHANGED Event will not update the document but create another one.所以我的同事告诉我,Elasticsearch 中的 Indexed Documents 是不可变的,当我发送 order.SUBMITTED 事件被索引时,它会创建文档但 order.CHANGED 事件不会更新文档而是创建另一个文档。 Now I can't quite judge the consequence of this, for my Business Case (I will ask orders of my Customer1 and I will see Status SUBMITTED and CHANGED, 2 records as query response) or operational (additional load and storage).现在我不能完全判断这个结果,对于我的业务案例(我将询问我的 Customer1 的订单,我将看到状态 SUBMITTED 和 CHANGED,2 条记录作为查询响应)或操作(额外的负载和存储)。

Did I understand correctly the behaviour of Eleasticsearch?我是否正确理解了 Eleasticsearch 的行为? If yes, will Solr behave any different?如果是,Solr 会有什么不同吗?

If understood correctly an both will behave same, can I design anything differently that it would help reach my goals.如果理解正确,两者将表现相同,我可以设计任何不同的东西来帮助实现我的目标。

Finally I have no problem using PostGre for this solution, I just tough Elasticsearch or Solr would be a more natural choice for this problem.最后,我对这个解决方案使用 PostGre 没有问题,我只是认为 Elasticsearch 或 Solr 是解决这个问题的更自然的选择。 What do you think?你怎么认为?

Thx for answers.谢谢你的答案。

You colleague is partially correct, about the costly updates in Elasticsearch(ES) and updates being immutable, but it doesn't mean ES is not suitable for system with frequent updates, in fact due to its scalability and distributed nature its preferred choice and being used in high-throughput and low latency systems(including the search systems).您的同事部分正确,关于 Elasticsearch(ES) 中昂贵的更新和更新是不可变的,但这并不意味着 ES 不适合频繁更新的系统,事实上由于其可扩展性和分布式特性,它是首选和被用于高吞吐量和低延迟系统(包括搜索系统)。 There are few misconception you have, and I would try to explain them.你有一些误解,我会尽力解释它们。

  1. Both ES and Solr are based on Lucene, and costly updates or immutable updates are the property of Lucene, so it doesn't matter whether you choose ES or Solr, you will underlying using Lucene and will have same update mechanism. ES和Solr都是基于Lucene的,而昂贵的更新或不可变更新是Lucene的属性,所以无论你选择ES还是Solr,你都将底层使用Lucene,并具有相同的更新机制。
  2. Updates are immutable it doesn't mean that your old status of Order will always be in the index, So for example initially your order status is SUBMITTED and later you update it to CHANGED , so even its immutable but when you query the order status, you will get the latest status(if refresh Happened on the index, default is 1 sec in ES), Apart from permanent deletion of old documents(Happens during the merge process, explained in #3), ES marks old document as deleted(soft delete by updating a boolean flag delete, on updation of document), due to this during your search these soft deleted documents are not returned.更新是不可变的,这并不意味着您的旧订单状态将始终在索引中,例如,最初您的订单状态是SUBMITTED ,后来您将其更新为CHANGED ,所以即使它是不可变的,但是当您查询订单状态时,您将获得最新状态(如果刷新发生在索引上,在 ES 中默认为 1 秒),除了永久删除旧文档(在合并过程中发生,在#3 中解释),ES 将旧文档标记为已删除(软通过更新 boolean 标志删除,在文档更新时删除),因此在您搜索期间不会返回这些软删除的文档。
  3. ES periodically deletes the old document, so in your case order status SUBMITTED will be deleted from index during merge process , so that old documents are deleted, and your index size doesn't grow. ES 会定期删除旧文档,因此在您的情况下order状态SUBMITTED将在合并过程中从索引中删除,以便旧文档被删除,并且您的索引大小不会增加。

Also its very important to understand, that this immutable updates provides a huge benefit to improve the search/read performance as now these segments(which contains the documents in ES) can be used in multi-threading env as well as can be cached due to immutability reasons.同样重要的是要理解,这种不可变更新为提高搜索/读取性能提供了巨大的好处,因为现在这些段(包含 ES 中的文档)可以在多线程环境中使用,并且可以缓存由于不变性原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Lucene数据复制如何在ElasticSearch和Apache Solr等技术上工作 - How Lucene Data Replication Works on Technologies Like ElasticSearch and Apache Solr 如何验证index_name在Elasticsearch中有效 - How to verify index_name works in elasticsearch Elasticsearch高更新索引压缩 - Elasticsearch high updates index compacting 如何像Solr一样在ElasticSearch中使用相同类型或索引的连接 - How to use join at a same type or index in ElasticSearch like solr 如何在Lucene / Solr / Elasticsearch索引或NoSQL数据库中存储树数据? - How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db? ElasticSearch 更新不是即时的,你如何等待 ElasticSearch 完成更新它的索引? - ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index? 如何将Solr查询转换为Elasticsearch - How to translate a Solr query into Elasticsearch 如何将solr或elasticsearch与JPA集成? - How to integrate solr or elasticsearch with JPA? Elasticsearch重新编制索引:在建立新索引时,如何将更新定向到该索引? - Elasticsearch reindexing: How do you direct updates to the new index while it is being built? ElasticSearch:索引的并发更新,而同一索引的 _reindex 正在进行中 - ElasticSearch : Concurrent updates to index while _reindex for the same index in progress
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM