ArangoDB - 群集上的查询性能降低

Question

I have a query that compares two collections and finds the "missing" documents from one side. 我有一个查询比较两个集合，并从一侧找到“丢失”文档。 Both collections (existing and temp) contain about 250K documents. 两个集合（现有和临时）包含大约250K文档。

FOR existing IN ExistingCollection
    LET matches = (
        FOR temp IN TempCollection
            FILTER temp._key == existing._key
            RETURN true
    )
    FILTER LENGTH(matches) == 0
    RETURN existing

When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds. 当它在单服务器环境中运行时（DB和Foxx在同一服务器/容器上），这在0.5秒内就像闪电一样运行。

However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000 after the initial FOR existing ... to keep it from timing out! 但是，当我在集群中运行它（单个DB，单个协调器）时，即使DB和Coord位于同一个物理主机（不同的容器）上，我也必须在初始FOR existing ...之后添加LIMIT 1000 FOR existing ...防止超时！ Still, this limited result returns in almost 7 seconds! 不过，这个有限的结果几乎在7秒内恢复！

Looking at the Execution Plan, I see that there are several REMOTE and GATHER statements after the LET matches ... SubqueryNode. 查看执行计划，我发现在LET matches ...后有几个REMOTE和GATHER语句LET matches ... SubqueryNode。 From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data. 从我可以收集的信息来看，问题源于用于过滤此数据的数据存储和内存结构的分离。

My question: can this type of operation be done efficiently on a cluster? 我的问题：这种类型的操作可以在集群上高效完成吗？

I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution. 我需要检测过时的（待删除的）文档，但这显然不是一个可行的解决方案。

Answer 1

Your query executes one subquery for each document in the existing collection. 您的查询为现有集合中的每个文档执行一个子查询。 Each subquery will require many HTTP roundtrips for setup, the actual querying and shutdown. 每个子查询都需要许多HTTP往返进行设置，实际查询和关闭。

You can avoid subqueries with the following query. 您可以使用以下查询来避免子查询。 It loads all document _key 's into RAM - but that should be no problem with your rather small collections. 它将所有文档_key加载到RAM中 - 但是对于相当小的集合来说应该没问题。

LET ExistingCollection = (FOR existing IN c2 RETURN existing._key)
LET TempCollection = (FOR temp IN c1 RETURN temp._key)
RETURN MINUS(ExistingCollection, TempCollection)

ArangoDB - 群集上的查询性能降低

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-02 10:59:50

ArangoDB - 群集上的查询性能降低

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-02 10:59:50

解决方案1
0 已采纳 2019-01-02 10:59:50