[英]Arangodb AQL Filter NOT IN collection, very slow
I want to find the set of users not having a profile. 我想找到没有个人资料的用户组。
ArangoDB 2.4.3
LENGTH(users) -> 130k
LENGTH(profiles) -> 110k
users.userId -> unique hash index
profiles.userId -> unique hash index
This AQL snippet I made is slower than a snail crossing the Grand Canyon in mid-summer. 我制作的AQL片段比夏季中期穿越大峡谷的蜗牛要慢。
LET usersWithProfiles = ( /* This part is ok */
FOR i IN users
FOR j IN profiles
FILTER i.userId == j.userId
RETURN i
)
LET usersWithoutProfiles = ( /* This is not */
FOR i IN usersWithProfiles
FILTER i NOT IN users
RETURN i
)
RETURN LENGTH(usersWithoutProfiles)
I'm pretty sure there is a perfectly sane way of doing it right, but I'm missing it. 我很确定有一种完全正确的做法,但是我很想念它。 Any ideas? 有任何想法吗?
Edit 1 (After @ dothebart 's response): 编辑1 (在@dothebart的回复之后):
This is the new query, but it is still very slow 这是新查询,但仍然很慢
LET userIds_usersWithProfile = (
FOR i IN users
FOR j IN profile
FILTER i.userId == j.userId
RETURN i.userId
)
LET usersWithoutProfiles = (
FOR i IN users
FILTER i.userId NOT IN userIds_usersWithProfile
RETURN i
)
RETURN LENGTH(usersWithoutProfiles)
Note also that this part of the original query was extremely expensive: 另请注意,原始查询的这部分非常昂贵:
LET usersWithoutProfiles = (
FOR i IN usersWithProfiles
FILTER i NOT IN users
RETURN i
)
The reason is the FILTER
using users
, which at this point is an expression that builds all documents from the collections as an array. 原因是FILTER
使用users
,此时这是一个表达式,它将集合中的所有文档构建为数组。 Instead of using this, I suggest this query, which will return the _key
attribute of users that do not have an associated profile record: 而不是使用它,我建议这个查询,它将返回没有关联的配置文件记录的用户的_key
属性:
FOR user IN users
LET profile = (
FOR profile IN profiles
FILTER profile.userId == user.userId
RETURN 1
)
FILTER LENGTH(profile) == 0
RETURN user._key
The reason for the poor performance is that it will not be able to utilize indices for your operation, since it needs to do a full compare of each document in the collection. 性能不佳的原因是它无法为您的操作使用索引,因为它需要对集合中的每个文档进行全面比较。
You can for shure use the explain https://www.arangodb.com/2015/02/02/arangodb-2-4-2 utility to let arangodb tell you where the expenses of your query are. 您可以使用解释https://www.arangodb.com/2015/02/02/arangodb-2-4-2实用程序让arangodb告诉您查询的费用在哪里。
Your query will probably not do what you expect from it. 您的查询可能无法满足您的期望。 usersWithoutProfiles will be empty, since any user with a Profile will be found in the users collection. usersWithoutProfiles将为空,因为任何具有配置文件的用户都将在users集合中找到。 If you want to have the other part of the users collection, it could look like that: 如果您想拥有users集合的其他部分,它可能看起来像这样:
LET usersWithProfiles = ( /* This part is ok */
FOR i IN users
FOR j IN profiles
FILTER i.userId == j.userId
RETURN i
)
/* now we pick the IDs, we could have done that in your first query... */
LET userWithProfilesIds = FOR i IN userWithProfiles RETURN i.userId;
/* now filter the user list by that */
LET usersWithoutProfiles = FOR i IN users
FILTER i.userId NOT IN userWithProfileIds
RETURN i;
RETURN LENGTH(usersWithoutProfiles)
should give you a proper result. 应该给你一个合适的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.