[英]How the index updates works for Solr and Elasticsearch?
I have an application that is using Event Sourcing and Command Query Responsiblity Seggragation Pattern.我有一个使用事件溯源和命令查询责任隔离模式的应用程序。 Development of the Command part is complete and I have to decide how should I implement the Query part.
命令部分的开发已经完成,我必须决定如何实现查询部分。
My system deals with customer orders, so when event arrives for an order, that order processed with orderId and order payload.我的系统处理客户订单,因此当订单事件到达时,该订单会使用 orderId 和订单有效负载进行处理。 The thing is, in this form only whay to query the orders is over orderId so I can't ask a question like give me all the order in the system with status OPEN.
问题是,在这种形式下,查询订单的唯一方式是通过 orderId,所以我不能问一个问题,比如给我系统中状态为 OPEN 的所有订单。
For this part I have to use the query part, my potential technology implementations for the query part, a classical solution like PostGre DB or more elegant way in my opinion Solr/Elasticsearch.对于这部分,我必须使用查询部分,我对查询部分的潜在技术实现,一个像 PostGre DB 这样的经典解决方案,或者在我看来 Solr/Elasticsearch 更优雅的方式。
I have a basic knowledge/experience about Solr/Elasticsearch and I want to use this opurtunity to learn more but here comes my dilema.我对 Solr/Elasticsearch 有基本的了解/经验,我想利用这个机会学习更多,但我的困境来了。 Some other department in our company is already working with Elasticsearch and a colleage from that deperatment told me, updates in elasticsearch is not a good idea, I didn't quite understand his argumentation, so I like to ask here what I am planning to do so you can tell me, it is a bad idea or Solr is better suited for it.
我们公司的一些其他部门已经在与 Elasticsearch 合作,那个 deperatment 的同事告诉我,更新 elasticsearch 不是一个好主意,我不太理解他的论点,所以我想在这里问我打算做什么所以你可以告诉我,这是个坏主意,或者 Solr 更适合它。
I am planning every status change for my order to send as an update for Elasticsearch, so it will look like the following.我计划将我的订单的每个状态更改作为 Elasticsearch 的更新发送,因此它看起来如下所示。
id ![]() |
Status![]() |
Customer![]() |
Items![]() |
|
---|---|---|---|---|
orderId1 ![]() |
-> ![]() |
order.SUBMITTED![]() |
order.Customer![]() |
order.Items![]() |
orderId1 ![]() |
-> ![]() |
order.CHANGED![]() |
order.Customer1![]() |
order.Items![]() |
orderId1 ![]() |
-> ![]() |
order.PROCESSING![]() |
order.Customer1![]() |
order.Items![]() |
orderId1 ![]() |
-> ![]() |
order.ON_DELIVERY![]() |
order.Customer1![]() |
order.Items![]() |
orderId1 ![]() |
-> ![]() |
order.COMPLETE![]() |
order.Customer1![]() |
order.Items![]() |
As you see, I have to send several updates for orderId, to Elasticsearch/Solr.如您所见,我必须将 orderId 的多个更新发送到 Elasticsearch/Solr。
So my colleague told me, Indexed Documents in Elasticsearch are immutables, when I send order.SUBMITTED Event to be indexed, it will create the document but order.CHANGED Event will not update the document but create another one.所以我的同事告诉我,Elasticsearch 中的 Indexed Documents 是不可变的,当我发送 order.SUBMITTED 事件被索引时,它会创建文档但 order.CHANGED 事件不会更新文档而是创建另一个文档。 Now I can't quite judge the consequence of this, for my Business Case (I will ask orders of my Customer1 and I will see Status SUBMITTED and CHANGED, 2 records as query response) or operational (additional load and storage).
现在我不能完全判断这个结果,对于我的业务案例(我将询问我的 Customer1 的订单,我将看到状态 SUBMITTED 和 CHANGED,2 条记录作为查询响应)或操作(额外的负载和存储)。
Did I understand correctly the behaviour of Eleasticsearch?我是否正确理解了 Eleasticsearch 的行为? If yes, will Solr behave any different?
如果是,Solr 会有什么不同吗?
If understood correctly an both will behave same, can I design anything differently that it would help reach my goals.如果理解正确,两者将表现相同,我可以设计任何不同的东西来帮助实现我的目标。
Finally I have no problem using PostGre for this solution, I just tough Elasticsearch or Solr would be a more natural choice for this problem.最后,我对这个解决方案使用 PostGre 没有问题,我只是认为 Elasticsearch 或 Solr 是解决这个问题的更自然的选择。 What do you think?
你怎么认为?
Thx for answers.谢谢你的答案。
You colleague is partially correct, about the costly updates in Elasticsearch(ES) and updates being immutable, but it doesn't mean ES is not suitable for system with frequent updates, in fact due to its scalability and distributed nature its preferred choice and being used in high-throughput and low latency systems(including the search systems).您的同事部分正确,关于 Elasticsearch(ES) 中昂贵的更新和更新是不可变的,但这并不意味着 ES 不适合频繁更新的系统,事实上由于其可扩展性和分布式特性,它是首选和被用于高吞吐量和低延迟系统(包括搜索系统)。 There are few misconception you have, and I would try to explain them.
你有一些误解,我会尽力解释它们。
SUBMITTED
and later you update it to CHANGED
, so even its immutable but when you query the order status, you will get the latest status(if refresh Happened on the index, default is 1 sec in ES), Apart from permanent deletion of old documents(Happens during the merge process, explained in #3), ES marks old document as deleted(soft delete by updating a boolean flag delete, on updation of document), due to this during your search these soft deleted documents are not returned.SUBMITTED
,后来您将其更新为CHANGED
,所以即使它是不可变的,但是当您查询订单状态时,您将获得最新状态(如果刷新发生在索引上,在 ES 中默认为 1 秒),除了永久删除旧文档(在合并过程中发生,在#3 中解释),ES 将旧文档标记为已删除(软通过更新 boolean 标志删除,在文档更新时删除),因此在您搜索期间不会返回这些软删除的文档。order
status SUBMITTED
will be deleted from index during merge process , so that old documents are deleted, and your index size doesn't grow. order
状态SUBMITTED
将在合并过程中从索引中删除,以便旧文档被删除,并且您的索引大小不会增加。 Also its very important to understand, that this immutable updates provides a huge benefit to improve the search/read performance as now these segments(which contains the documents in ES) can be used in multi-threading env as well as can be cached due to immutability reasons.同样重要的是要理解,这种不可变更新为提高搜索/读取性能提供了巨大的好处,因为现在这些段(包含 ES 中的文档)可以在多线程环境中使用,并且可以缓存由于不变性原因。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.