[英]Optimize NeptuneDB Gremlin query
vehicles --> accounts --> organizations <-- users车辆 --> 账户 --> 组织 <-- 用户
We have the above graph structure where vechicles, accounts, organizations and users are vertex labels and the arrows indicate the edge direction.我们有上面的图结构,其中车辆、账户、组织和用户是顶点标签,箭头表示边方向。
Consider the following number of vertices:考虑以下数量的顶点:
organizations = 1
accounts per organizations = 2
vehciles per account = 5000
users per organizations = 100
Our requirement is, given two vertexIds, find a set of all users and vehicles that satisfy the above graph.我们的要求是,给定两个 vertexId,找到一组满足上图的所有用户和车辆。
For example if I have vertex1 = accounts:1 and vertex2 = organizations:1, find the set of users and vehicles that are part of these two vertices.例如,如果我有 vertex1 = accounts:1 和 vertex2 = organizations:1,找到属于这两个顶点的用户和车辆集。
We have the following query我们有以下查询
g.V('accounts:1').outE().otherV().hasId('organizations:1')
.V('accounts:1').inE().otherV().as('B')
.V('organizations:1').inE().otherV().as('A')
.select('A', 'B')
While this works, the query takes ~3.5 seconds to complete, now we know that there are going to be 500000 traversers for this query.虽然这有效,但查询需要大约 3.5 秒才能完成,现在我们知道这个查询将有 500000 个遍历器。
Is there a better way to do this?有一个更好的方法吗?
Thanks for the help谢谢您的帮助
Edit #1: Attaching the query's profile API response编辑 #1:附加查询的配置文件 API 响应
Optimized Traversal
===================
Neptune steps:
[
NeptuneGraphQueryStep(VertexId)@[A, B] {
JoinGroupNode {
JoinGroupNode {
PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=1}
PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
}, finishers=[dedup(?3)]
PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=102}
PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102, indexTime=0, joinTime=128, numSearches=102, actualTotalOutput=102}
PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100, indexTime=1, joinTime=6, numSearches=102, actualTotalOutput=100}
PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=0, joinTime=128, numSearches=100, actualTotalOutput=100}
PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=83, numSearches=100, actualTotalOutput=100}
PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=1, joinTime=1, numSearches=1, actualTotalOutput=100}
PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=100}
PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=100}
PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000, indexTime=0, joinTime=119, numSearches=1, actualTotalOutput=500000}
PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000, indexTime=194, joinTime=142, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000, indexTime=183, joinTime=499, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000, indexTime=193, joinTime=858, numSearches=5000, actualTotalOutput=500000}
PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260, indexTime=360, joinTime=1372, numSearches=500}
}, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep, Vertex(?3):EdgeOtherVertexStep, Vertex(?8):GraphStep, Edge(?13):VertexStep, Vertex(?10):EdgeOtherVertexStep, VertexId(?10):IdStep@[A], Vertex(?16):GraphStep, Edge(?21):VertexStep, Vertex(?18):EdgeOtherVertexStep, VertexId(?18):IdStep@[B]], joinStats=true, optimizationTime=329, maxVarId=24, executionTime=6279}
},
NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [SelectStep(last,[A, B])]
WARNING: >> SelectStep(last,[A, B]) << (or one of its children) is not supported natively yet
Physical Pipeline
=================
NeptuneGraphQueryStep@[A, B]
|-- StartOp
|-- JoinGroupOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- FilterOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260})
Runtime (ms)
============
Query Execution: 6283.262
Serialization: 2120.104
Traversal Metrics
=================
Step Count Traversers Time (ms) % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(VertexId)@[A, obje... 500000 500000 2502.636 41.43
NeptuneTraverserConverterStep 500000 500000 2580.098 42.71
SelectStep(last,[A, B]) 500000 500000 958.328 15.86
>TOTAL - - 6041.062 -
Predicates
==========
# of predicates: 37
WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance
Results
=======
Count: 500000
Output: <Removed for space>
Response serializer: application/vnd.gremlin-v3.0+gryo
Response size (bytes): 64,000,045
Index Operations
================
Query execution:
# of statement index ops: 15915
# of unique statement index ops: 15915
Duplication ratio: 1.0
# of terms materialized: 0
Serialization:
# of statement index ops: 0
# of terms materialized: 0
If possible always provide labels on traversal steps like in()
and out()
.如果可能,请始终在遍历步骤上提供标签,例如
in()
和out()
。 Also, you do not need to specify inE().otherV()
unless you need data from the edge.此外,您不需要指定
inE().otherV()
除非您需要来自边缘的数据。 in()
will suffice. in()
就足够了。 As a first step I would try:作为第一步,我会尝试:
g.V('accounts:1').out(<labels>).hasId('organizations:1')
.V('accounts:1').in(<labels>).as('B')
.V('organizations:1').in(<labels>).as('A')
.select('A', 'B')
Where <labels>
will be of the form in('works-with','knows')
.其中
<labels>
的形式in('works-with','knows')
。
Using edge labels, especially on the in
steps can help a lot in some cases.使用边缘标签,尤其是
in
步骤中,在某些情况下会有很大帮助。 I would start there as a first step.作为第一步,我会从那里开始。 There are other rewrites that can be tried but this is a good first step.
可以尝试其他重写,但这是良好的开端。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.