简体   繁体   English

如何优化ArangoDB中的图遍历?

[英]How to optimize graph traversals in ArangoDB?

I primarily intended to ask this question : "Is ArangoDB a true graph database ?" 我主要打算问这个问题:“ArangoDB是真正的图形数据库吗?”

But, this question would sound quite offending. 但是,这个问题听起来很冒犯。

You, peoples at triAGENS, did a really great job in creating a "multi-paradigm" database. 你们,triAGENS的人们在创建“多范式”数据库方面做得非常出色。 As a user of PostgreSQL, PostGIS, MongoDB and Neo4J/Titan, I really appreciate to see an "all-in-one" solution :) 作为PostgreSQL,PostGIS,MongoDB和Neo4J / Titan的用户,我真的很高兴看到“一体化”的解决方案:)

But the question remains, basically creating a graph in ArangoDB requires to create two separate collections : one for edges and one for vertices, thus, as far as I understand, it already means that vertices and related edges are not "physically" neighbors. 但问题仍然存在,基本上在ArangoDB中创建图形需要创建两个独立的集合:一个用于边缘,一个用于顶点,因此,据我所知,它已经意味着顶点和相关边缘不是“物理”邻居。

Moreover, even after creating appropriate index, I'm facing some serious performance issues when doing this kind of stuff in Gremlin 而且,即使在创建了适当的索引之后,我在Gremlin中做这种事情时也面临着一些严重的性能问题

g.v('an_id').out('likes').in('likes').count()

Which returns a result after ~ 3 seconds (perceived time) 在~3秒后(感知时间)返回结果

I assumed I poorly understood how Gremlin and Blueprint/ArangoDB worked so I tried to rewrite the same query using AQL : 我以为我很难理解Gremlin和Blueprint / ArangoDB是如何工作的所以我试图使用AQL重写相同的查询:

LET lst = (FOR e1 in NEIGHBORS(vertices, edges, "an_id", "outbound", [ { "$label": "likes" } ] )
    FOR e2 in NEIGHBORS(vertices, edges, e1.edge._to, "inbound", [ { "$label": "likes" } ] )
        RETURN 1
    )
RETURN length(lst)

Which gives me a delay of same order of magnitude. 这给了我一个相同数量级的延迟。

If I tried to run the same query on a Titan or Neo4j database (with the very same data), queries returns almost immediately (perceived time : <200ms) 如果我尝试在Titan或Neo4j数据库上运行相同的查询(使用相同的数据),查询几乎立即返回(感知时间:<200ms)

So it seems to me that ArangoDB graph features are a "smart graph layer" above a "traditionnal document database" but that ArangoDB is not a "native" graph database. 所以在我看来,ArangoDB图形功能是“传统文档数据库”之上的“智能图形层”,但ArangoDB不是“本机”图形数据库。

To confirm this feeling, I transform data to load it in PostgreSQL and run a query (with a multiple table JOIN as you can assume) and got similar (to ArangoDB) execution delays 为了证实这种感觉,我转换数据以在PostgreSQL中加载它并运行一个查询(你可以假设有一个多表JOIN)并得到类似的(对ArangoDB)执行延迟

Did I do something wrong (in AQL query) ? 我做错了什么(在AQL查询中)?

Is there a way to optimize the database to get better traversal times ? 有没有办法优化数据库以获得更好的遍历时间?

In PostgreSQL, conceptually, I would mix edge and node and use a CLUSTER clause to physically order data, does something similar can be done in ArangoDB ? 在PostgreSQL中,从概念上讲,我会混合使用edge和node并使用CLUSTER子句对数据进行物理排序,在ArangoDB中可以做类似的事情吗? (I assume that it would be hard, as it would involve to "interlace" edges and nodes, just an intuition) (我认为它会很难,因为它会涉及“交错”边缘和节点,只是一种直觉)

i am a Core Developer of ArangoDB. 我是ArangoDB的核心开发人员。 Could you give me a bit more information ob the dimensions of data you are using? 您能否根据您使用的数据维度向我提供更多信息?

  • Amount of vertices 顶点数量
  • Amount of edges 边缘量

Then we can create our own setup with equal dimensions and optimize it. 然后我们可以创建具有相同尺寸的自己的设置并进行优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM