简体   繁体   English

ArangoDB索引用于边集合

[英]ArangoDB Index usage with edge collections

Task: Fastest way to update many edges attributes. 任务:更新许多边缘属性的最快方法。 For performance reasons, I am ignore graph methods and work with collection directly for filtering. 出于性能原因,我忽略了图形方法并直接使用集合进行过滤。

ArangoDB 2.8b3 ArangoDB 2.8b3

Query [Offer - edge collection]: 查询[优惠 - 边缘集合]:

FOR O In Offer
FILTER O._from == @from and O._to == @to and O.expired > DATE_TIMESTAMP(@newoffertime)
UPDATE O WITH { expired: @newoffertime } IN Offer
RETURN { _key: OLD._key, prices_hash: OLD.prices_hash }

I have system index on _to, _from and range index on expired 我对_to,_from和范围索引的系统索引已过期

Query explain show 查询解释显示

7   edge   Offer        false    false        49.51 %   [ `_from`, `_to` ]   O.`_to` == "Product/1023058135528"

System index used for filtering only part of records (_to), not for both (_from, _to), 'expired' index also not used. 系统索引仅用于过滤部分记录(_to),而不是用于(_from,_to),'过期'索引也不使用。 Please explain me the reasons for this behavior, and there is a possibility to specify hint of indices to be used for the shortest path, if I know for sure when planning data model? 请解释一下这种行为的原因,如果我在规划数据模型时肯定知道的话,有可能指定用于最短路径的索引提示吗?

For filter conditions combined with logical ANDs as in your query, ArangoDB's query optimizer will pick a single index. 对于在查询中结合逻辑AND的过滤条件,ArangoDB的查询优化器将选择单个索引。 This is the reason why it hasn't picked the edge index and the skiplist index at the same time. 这就是为什么它没有同时选择边缘索引跳转列表索引的原因。

It will do a selection between the skiplist index on expired and the edge index on [ "_from", "_to" ] , and will pick the one for which it determines the lower cost, which is measured by index selectivity estimates. 它将在expired[ "_from", "_to" ]索引和[ "_from", "_to" ]上的边缘索引之间进行选择,并将选择确定较低成本的那个,这是通过索引选择性估计来衡量的。 As the explain output shows, it seems to have picked the edge index on _to . 正如解释输出所示,它似乎已经选择_to上的边缘索引。

The edge index internally consists of two separate hash indexes, one on the _from attribute and one on the _to attribute, so it allows quick access via both the _from and the _to attributes. 边缘索引内部由两个单独的哈希索引组成,一个在_from属性上,一个在_to属性上,因此它允许通过_from_to属性快速访问。 However, it's not a combined index on [ "_from", "_to" ] , so it does not support queries that ask for _from and _to at the same time. 但是,它不是 [ "_from", "_to" ]的组合索引,因此它不支持同时请求_from_to查询。 It has to pick one of the internal hash indexes, and seems to have picked the one on _to in that query. 它必须选择一个内部哈希索引,并且似乎在该查询中选择_to上的那个。 The decision is based on average index selectivity again. 该决定再次基于平均指数选择性。

There is no way to provide any index usage hint to the optimizer - apart from that, it wouldn't be able to use two indexes at the same time for this particular query. 无法向优化器提供任何索引使用提示 - 除此之外,它无法同时为此特定查询使用两个索引。

Looking at the selectivity estimate in the explain output, it seems that the edge index is not very selective, meaning there'll be lots of edges with the same _to values. 看看解释输出中的选择性估计,似乎边缘索引不是很有选择性,这意味着会有很多边具有相同的_to值。 As the optimizer should have also taken into account the index on _from , I would assume that index is even less selective, and that each of these indexes will only help to skip at most 50 % of the edges, which is not very much. 由于优化器也应该考虑_from上的索引,我会假设索引的选择性更低,并且这些索引中的每一个只能帮助跳过最多50%的边缘,这不是很多。 If that's actually the case, then the query will still retrieve (and post-filter) a lot of documents, explaining potential slowness. 如果确实如此,那么查询仍将检索(并过滤)大量文档,解释潜在的缓慢。

At the moment the attributes _from and _to are automatically indexed in an edge collection's always-present edge index, and they cannot be used in additional, user-defined indexes. 目前,属性_from_to在边集合的始终存在的边缘索引中自动编入索引,并且它们不能用于其他用户定义的索引中。 This is a feature that we would like to add in a future release, because with _from and _to being accessible for user-defined indexes, one could create a combined (sorted) index on [ "_from", "_to", "expired" ] which would be potentially much more selective than any of the three single-attribute indexes in isolation. 这是我们希望在将来的版本中添加的功能,因为_from_to可以访问用户定义的索引,可以在[ "_from", "_to", "expired" ]上创建组合(排序)索引[ "_from", "_to", "expired" ]这将是比任何单独的三个单属性索引的潜在更具有选择性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM