如何优化ArangoDB中的图遍历？

Question

I primarily intended to ask this question : "Is ArangoDB a true graph database ?" 我主要打算问这个问题：“ArangoDB是真正的图形数据库吗？”

But, this question would sound quite offending. 但是，这个问题听起来很冒犯。

You, peoples at triAGENS, did a really great job in creating a "multi-paradigm" database. 你们，triAGENS的人们在创建“多范式”数据库方面做得非常出色。 As a user of PostgreSQL, PostGIS, MongoDB and Neo4J/Titan, I really appreciate to see an "all-in-one" solution :) 作为PostgreSQL，PostGIS，MongoDB和Neo4J / Titan的用户，我真的很高兴看到“一体化”的解决方案:)

But the question remains, basically creating a graph in ArangoDB requires to create two separate collections : one for edges and one for vertices, thus, as far as I understand, it already means that vertices and related edges are not "physically" neighbors. 但问题仍然存在，基本上在ArangoDB中创建图形需要创建两个独立的集合：一个用于边缘，一个用于顶点，因此，据我所知，它已经意味着顶点和相关边缘不是“物理”邻居。

Moreover, even after creating appropriate index, I'm facing some serious performance issues when doing this kind of stuff in Gremlin 而且，即使在创建了适当的索引之后，我在Gremlin中做这种事情时也面临着一些严重的性能问题

g.v('an_id').out('likes').in('likes').count()

Which returns a result after ~ 3 seconds (perceived time) 在~3秒后（感知时间）返回结果

I assumed I poorly understood how Gremlin and Blueprint/ArangoDB worked so I tried to rewrite the same query using AQL : 我以为我很难理解Gremlin和Blueprint / ArangoDB是如何工作的所以我试图使用AQL重写相同的查询：

LET lst = (FOR e1 in NEIGHBORS(vertices, edges, "an_id", "outbound", [ { "$label": "likes" } ] )
    FOR e2 in NEIGHBORS(vertices, edges, e1.edge._to, "inbound", [ { "$label": "likes" } ] )
        RETURN 1
    )
RETURN length(lst)

Which gives me a delay of same order of magnitude. 这给了我一个相同数量级的延迟。

If I tried to run the same query on a Titan or Neo4j database (with the very same data), queries returns almost immediately (perceived time : <200ms) 如果我尝试在Titan或Neo4j数据库上运行相同的查询（使用相同的数据），查询几乎立即返回（感知时间：<200ms）

So it seems to me that ArangoDB graph features are a "smart graph layer" above a "traditionnal document database" but that ArangoDB is not a "native" graph database. 所以在我看来，ArangoDB图形功能是“传统文档数据库”之上的“智能图形层”，但ArangoDB不是“本机”图形数据库。

To confirm this feeling, I transform data to load it in PostgreSQL and run a query (with a multiple table JOIN as you can assume) and got similar (to ArangoDB) execution delays 为了证实这种感觉，我转换数据以在PostgreSQL中加载它并运行一个查询（你可以假设有一个多表JOIN）并得到类似的（对ArangoDB）执行延迟

Did I do something wrong (in AQL query) ? 我做错了什么（在AQL查询中）？

Is there a way to optimize the database to get better traversal times ? 有没有办法优化数据库以获得更好的遍历时间？

In PostgreSQL, conceptually, I would mix edge and node and use a CLUSTER clause to physically order data, does something similar can be done in ArangoDB ? 在PostgreSQL中，从概念上讲，我会混合使用edge和node并使用CLUSTER子句对数据进行物理排序，在ArangoDB中可以做类似的事情吗？ (I assume that it would be hard, as it would involve to "interlace" edges and nodes, just an intuition) （我认为它会很难，因为它会涉及“交错”边缘和节点，只是一种直觉）

Answer 1

i am a Core Developer of ArangoDB. 我是ArangoDB的核心开发人员。 Could you give me a bit more information ob the dimensions of data you are using? 您能否根据您使用的数据维度向我提供更多信息？

Amount of vertices 顶点数量
Amount of edges 边缘量

Then we can create our own setup with equal dimensions and optimize it. 然后我们可以创建具有相同尺寸的自己的设置并进行优化。

如何优化ArangoDB中的图遍历？

问题描述

1 个解决方案

解决方案1
5 2014-01-10 10:54:28

如何优化ArangoDB中的图遍历？

问题描述

1 个解决方案

解决方案1 5 2014-01-10 10:54:28

解决方案1
5 2014-01-10 10:54:28