海王星上的 Gremlin 慢连接时间

Question

I have an issue with a request performance in Neptune.我对 Neptune 中的请求性能有疑问。 I have a graph like this:我有这样的图表：

hasId('A_id') (count: 1) -> out('has_group') (count: 12) -> out('has_class').hasLabel('C') (count: 9751) -> out('has_type').hasLabel('D') (count: 9749) -> out('has_element') (count: 472370) -> hasLabel(Within(11 elements label)) (count: 107233) hasId('A_id') (count: 1) -> out('has_group') (count: 12) -> out('has_class').hasLabel('C') (count: 9751) -> out('has_type ').hasLabel('D') (count: 9749) -> out('has_element') (count: 472370) -> hasLabel(Within(11 elements label)) (count: 107233)

I replace all the label to simplify, but the graph is exactly like this.我把所有的label都换掉简化了，但是图就是这样的。 The within decrease a lot the performance of the query. within 大大降低了查询的性能。 For more details:更多细节：

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').count()

This request take less than 1 secondes and return 472370.此请求用时不到 1 秒并返回 472370。

If I add the last hasLabel(whithin()) like this:如果我像这样添加最后一个 hasLabel(whithin()) ：

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').hasLabel(P.within('element_1','element_2','element_3','element_4','element_5','element_6','element_7','element_8','element_9','element_10','element_11')).count()

The time decrease to 18 seconds.时间减少到 18 秒。 And when I profile the query we can see a joinTime which take at least 90% of the query execution time on the within:当我分析查询时，我们可以看到一个 joinTime，它在内部至少占用了 90% 的查询执行时间：

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208482, expectedTotalOutput=2000, indexTime=0, joinTime=9, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=5, joinTime=111, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9675934, expectedTotalOutput=11695, indexTime=15, joinTime=95, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=23, joinTime=402, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=8562896, expectedTotalOutput=556904, indexTime=18, joinTime=442, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ask . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1158922, indexTime=598, joinTime=18991, numSearches=472370
            }
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=20799
        }
    }
]

The request seems quite simple and the volume not that much.这个请求看起来很简单，而且数量也不多。 The within is not the good approach here?内部不是这里的好方法吗？ Do you have some clue to improve the query?您有改进查询的线索吗？

EDIT 1:编辑 1：

I tried with the request provided by @saikiranboga and the number of index operation is large better (divided by 10) but the join time is still high.我尝试了@saikiranboga 提供的请求，索引操作的数量更好（除以 10），但连接时间仍然很长。 I'm quite confuse.我很困惑。

The index operation number before:之前的索引操作数：

Index Operations
================
Query execution:
    # of statement index ops: 525563
    # of unique statement index ops: 525563
    Duplication ratio: 1.0
    # of terms materialized: 0

and after之后

Index Operations
================
Query execution:
    # of statement index ops: 53666
    # of unique statement index ops: 53666
    Duplication ratio: 1.0
    # of terms materialized: 0

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208000, expectedTotalOutput=2000, indexTime=0, joinTime=10, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=4, joinTime=102, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9456689, expectedTotalOutput=11695, indexTime=13, joinTime=94, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=17, joinTime=341, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=7919022, expectedTotalOutput=556904, indexTime=17, joinTime=411, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ?16 . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1145096, indexTime=848, joinTime=15268, numSearches=473
            }
        }, finishers=[dedup(?15)
        ], annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep@[element
                ], VertexLabel(?16):LabelStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=17283
        }
    }
]

Answer 1

Your observation is correct, the last pattern is indeed what is taking more time in the query.您的观察是正确的，最后一个模式确实在查询中花费了更多时间。 "ask" projections usually involve multiple index look ups and could cause slowness, these are usually optimized by the database, but it is not in this case. “询问”预测通常涉及多个索引查找并可能导致缓慢，这些通常由数据库优化，但在这种情况下并非如此。

Could you try a rewritten version of the query that fetches the labels and filters them instead of the hasLabel(...) filter, like below:您能否尝试重写查询的版本来获取标签并过滤它们而不是 hasLabel(...) 过滤器，如下所示：

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5')
.out('has_group').out('has_class').hasLabel('C')
.out('has_type').hasLabel('D')
.out('has_element').as("element")
.label().is(
  within(
    'element_1','element_2','element_3','element_4','element_5',
    'element_6','element_7','element_8','element_9','element_10','element_11'
  )
)
.dedup("element")
.count()

海王星上的 Gremlin 慢连接时间

问题描述

1 个解决方案

解决方案1
2 2022-02-25 06:52:51

海王星上的 Gremlin 慢连接时间

问题描述

1 个解决方案

解决方案1 2 2022-02-25 06:52:51

解决方案1
2 2022-02-25 06:52:51