简体   繁体   English

优化 NeptuneDB Gremlin 查询

[英]Optimize NeptuneDB Gremlin query

vehicles --> accounts --> organizations <-- users车辆 --> 账户 --> 组织 <-- 用户

We have the above graph structure where vechicles, accounts, organizations and users are vertex labels and the arrows indicate the edge direction.我们有上面的图结构,其中车辆、账户、组织和用户是顶点标签,箭头表示边方向。

Consider the following number of vertices:考虑以下数量的顶点:

organizations = 1
accounts per organizations = 2
vehciles per account = 5000
users per organizations = 100

Our requirement is, given two vertexIds, find a set of all users and vehicles that satisfy the above graph.我们的要求是,给定两个 vertexId,找到一组满足上图的所有用户和车辆。

For example if I have vertex1 = accounts:1 and vertex2 = organizations:1, find the set of users and vehicles that are part of these two vertices.例如,如果我有 vertex1 = accounts:1 和 vertex2 = organizations:1,找到属于这两个顶点的用户和车辆集。

We have the following query我们有以下查询

g.V('accounts:1').outE().otherV().hasId('organizations:1')
.V('accounts:1').inE().otherV().as('B')
.V('organizations:1').inE().otherV().as('A')
.select('A', 'B')

While this works, the query takes ~3.5 seconds to complete, now we know that there are going to be 500000 traversers for this query.虽然这有效,但查询需要大约 3.5 秒才能完成,现在我们知道这个查询将有 500000 个遍历器。

Is there a better way to do this?有一个更好的方法吗?

Thanks for the help谢谢您的帮助

Edit #1: Attaching the query's profile API response编辑 #1:附加查询的配置文件 API 响应

  Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(VertexId)@[A, B] {
        JoinGroupNode {
            JoinGroupNode {
                PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=1}
                PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
                PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
                PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=2, numSearches=1, actualTotalOutput=1}
            }, finishers=[dedup(?3)]
            PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
            PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=1}
            PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=102}
            PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102, indexTime=0, joinTime=128, numSearches=102, actualTotalOutput=102}
            PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100, indexTime=1, joinTime=6, numSearches=102, actualTotalOutput=100}
            PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=0, joinTime=128, numSearches=100, actualTotalOutput=100}
            PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=83, numSearches=100, actualTotalOutput=100}
            PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100, indexTime=1, joinTime=1, numSearches=1, actualTotalOutput=100}
            PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=100}
            PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=100}
            PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000, indexTime=0, joinTime=119, numSearches=1, actualTotalOutput=500000}
            PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000, indexTime=194, joinTime=142, numSearches=5000, actualTotalOutput=500000}
            PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000, indexTime=183, joinTime=499, numSearches=5000, actualTotalOutput=500000}
            PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000, indexTime=193, joinTime=858, numSearches=5000, actualTotalOutput=500000}
            PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260, indexTime=360, joinTime=1372, numSearches=500}
        }, annotations={path=[Vertex(?1):GraphStep, Edge(?6):VertexStep, Vertex(?3):EdgeOtherVertexStep, Vertex(?8):GraphStep, Edge(?13):VertexStep, Vertex(?10):EdgeOtherVertexStep, VertexId(?10):IdStep@[A], Vertex(?16):GraphStep, Edge(?21):VertexStep, Vertex(?18):EdgeOtherVertexStep, VertexId(?18):IdStep@[B]], joinStats=true, optimizationTime=329, maxVarId=24, executionTime=6279}
    },
    NeptuneTraverserConverterStep
]
+ not converted into Neptune steps: [SelectStep(last,[A, B])]

WARNING: >> SelectStep(last,[A, B]) << (or one of its children) is not supported natively yet

Physical Pipeline
=================
NeptuneGraphQueryStep@[A, B]
    |-- StartOp
    |-- JoinGroupOp
        |-- JoinGroupOp
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?1=<accounts:1>, <lifestate>, "ACTIVE", ?) . project ?1 .], {estimatedCardinality=1799504, expectedTotalOutput=1})
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?1, ?5, ?3=<organizations:1>, ?6) . project ?1,?6,?3 . IsEdgeIdFilter(?6) .], {estimatedCardinality=102, expectedTotalOutput=1})
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?6, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1})
            |-- SpoolerOp(1000)
            |-- DynamicJoinOp(PatternNode[(?3, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
            |-- FilterOp
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?8=<organizations:1>, <~label>, ?9, <~>) . project distinct ?8 .], {estimatedCardinality=INFINITY, expectedTotalOutput=1})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?8, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, ?12, ?8, ?13) . project ?8,?13,?10 . IsEdgeIdFilter(?13) .], {estimatedCardinality=INFINITY, expectedTotalOutput=102})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?13, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=102})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?13, <role>, "admin", ?) . project ask .], {estimatedCardinality=113376, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, <~label>, ?14=<users>, <~>) . project ask .], {estimatedCardinality=2326404, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?10, <~label>, ?15=<users>, <~>) . project ?10 .], {estimatedCardinality=2326404, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?16=<accounts:1>, <~label>, ?17, <~>) . project distinct ?16 .], {estimatedCardinality=INFINITY, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?16, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=100})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, ?20, ?16, ?21) . project ?16,?21,?18 . IsEdgeIdFilter(?21) .], {estimatedCardinality=INFINITY, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?21, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=3341886, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, <~label>, ?22=<vehicles>, <~>) . project ask .], {estimatedCardinality=238260, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, <lifestate>, "ACTIVE", ?) . project ask .], {estimatedCardinality=1799504, expectedTotalOutput=1000})
        |-- SpoolerOp(1000)
        |-- DynamicJoinOp(PatternNode[(?18, <~label>, ?23=<vehicles>, <~>) . project ?18 .], {estimatedCardinality=238260})

Runtime (ms)
============
Query Execution: 6283.262
Serialization:   2120.104

Traversal Metrics
=================
Step                                                               Count  Traversers       Time (ms)    % Dur
-------------------------------------------------------------------------------------------------------------
NeptuneGraphQueryStep(VertexId)@[A, obje...                500000      500000        2502.636    41.43
NeptuneTraverserConverterStep                                     500000      500000        2580.098    42.71
SelectStep(last,[A, B])                              500000      500000         958.328    15.86
                                            >TOTAL                     -           -        6041.062        -

Predicates
==========
# of predicates: 37

WARNING: reverse traversal with no edge label(s) - .in() / .both() may impact query performance

Results
=======
Count: 500000
Output: <Removed for space>
Response serializer: application/vnd.gremlin-v3.0+gryo
Response size (bytes): 64,000,045


Index Operations
================
Query execution:
    # of statement index ops: 15915
    # of unique statement index ops: 15915
    Duplication ratio: 1.0
    # of terms materialized: 0
Serialization:
    # of statement index ops: 0
    # of terms materialized: 0

If possible always provide labels on traversal steps like in() and out() .如果可能,请始终在遍历步骤上提供标签,例如in()out() Also, you do not need to specify inE().otherV() unless you need data from the edge.此外,您不需要指定inE().otherV()除非您需要来自边缘的数据。 in() will suffice. in()就足够了。 As a first step I would try:作为第一步,我会尝试:

g.V('accounts:1').out(<labels>).hasId('organizations:1')
 .V('accounts:1').in(<labels>).as('B')
 .V('organizations:1').in(<labels>).as('A')
.select('A', 'B')

Where <labels> will be of the form in('works-with','knows') .其中<labels>的形式in('works-with','knows')

Using edge labels, especially on the in steps can help a lot in some cases.使用边缘标签,尤其是in步骤中,在某些情况下会有很大帮助。 I would start there as a first step.作为第一步,我会从那里开始。 There are other rewrites that can be tried but this is a good first step.可以尝试其他重写,但这是良好的开端。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM