简体   繁体   English

海王星上的 Gremlin 慢连接时间

[英]Gremlin slow joinTime on Neptune

I have an issue with a request performance in Neptune.我对 Neptune 中的请求性能有疑问。 I have a graph like this:我有这样的图表:

hasId('A_id') (count: 1) -> out('has_group') (count: 12) -> out('has_class').hasLabel('C') (count: 9751) -> out('has_type').hasLabel('D') (count: 9749) -> out('has_element') (count: 472370) -> hasLabel(Within(11 elements label)) (count: 107233) hasId('A_id') (count: 1) -> out('has_group') (count: 12) -> out('has_class').hasLabel('C') (count: 9751) -> out('has_type ').hasLabel('D') (count: 9749) -> out('has_element') (count: 472370) -> hasLabel(Within(11 elements label)) (count: 107233)

I replace all the label to simplify, but the graph is exactly like this.我把所有的label都换掉简化了,但是图就是这样的。 The within decrease a lot the performance of the query. within 大大降低了查询的性能。 For more details:更多细节:

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').count()

This request take less than 1 secondes and return 472370.此请求用时不到 1 秒并返回 472370。

If I add the last hasLabel(whithin()) like this:如果我像这样添加最后一个 hasLabel(whithin()) :

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').hasLabel(P.within('element_1','element_2','element_3','element_4','element_5','element_6','element_7','element_8','element_9','element_10','element_11')).count()

The time decrease to 18 seconds.时间减少到 18 秒。 And when I profile the query we can see a joinTime which take at least 90% of the query execution time on the within:当我分析查询时,我们可以看到一个 joinTime,它在内部至少占用了 90% 的查询执行时间:

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208482, expectedTotalOutput=2000, indexTime=0, joinTime=9, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=5, joinTime=111, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9675934, expectedTotalOutput=11695, indexTime=15, joinTime=95, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=23, joinTime=402, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=8562896, expectedTotalOutput=556904, indexTime=18, joinTime=442, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ask . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1158922, indexTime=598, joinTime=18991, numSearches=472370
            }
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=20799
        }
    }
]

The request seems quite simple and the volume not that much.这个请求看起来很简单,而且数量也不多。 The within is not the good approach here?内部不是这里的好方法吗? Do you have some clue to improve the query?您有改进查询的线索吗?

EDIT 1:编辑 1:

I tried with the request provided by @saikiranboga and the number of index operation is large better (divided by 10) but the join time is still high.我尝试了@saikiranboga 提供的请求,索引操作的数量更好(除以 10),但连接时间仍然很长。 I'm quite confuse.我很困惑。

The index operation number before:之前的索引操作数:

Index Operations
================
Query execution:
    # of statement index ops: 525563
    # of unique statement index ops: 525563
    Duplication ratio: 1.0
    # of terms materialized: 0

and after之后

Index Operations
================
Query execution:
    # of statement index ops: 53666
    # of unique statement index ops: 53666
    Duplication ratio: 1.0
    # of terms materialized: 0
Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208000, expectedTotalOutput=2000, indexTime=0, joinTime=10, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=4, joinTime=102, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9456689, expectedTotalOutput=11695, indexTime=13, joinTime=94, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=17, joinTime=341, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=7919022, expectedTotalOutput=556904, indexTime=17, joinTime=411, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ?16 . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1145096, indexTime=848, joinTime=15268, numSearches=473
            }
        }, finishers=[dedup(?15)
        ], annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep@[element
                ], VertexLabel(?16):LabelStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=17283
        }
    }
]

Your observation is correct, the last pattern is indeed what is taking more time in the query.您的观察是正确的,最后一个模式确实在查询中花费了更多时间。 "ask" projections usually involve multiple index look ups and could cause slowness, these are usually optimized by the database, but it is not in this case. “询问”预测通常涉及多个索引查找并可能导致缓慢,这些通常由数据库优化,但在这种情况下并非如此。

Could you try a rewritten version of the query that fetches the labels and filters them instead of the hasLabel(...) filter, like below:您能否尝试重写查询的版本来获取标签并过滤它们而不是 hasLabel(...) 过滤器,如下所示:

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5')
.out('has_group').out('has_class').hasLabel('C')
.out('has_type').hasLabel('D')
.out('has_element').as("element")
.label().is(
  within(
    'element_1','element_2','element_3','element_4','element_5',
    'element_6','element_7','element_8','element_9','element_10','element_11'
  )
)
.dedup("element")
.count()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法在 Neptune/Gremlin 中添加边缘属性? - Unable to add edge properties in Neptune/Gremlin? Gremlin with Neptune:如何处理 Vertex ID 更新? - Gremlin with Neptune: How to handle Vertex ID updates? Neptune,Python,Gremlin:使用值数组更新图形顶点中的属性 - Neptune, Python, Gremlin: Update a property in a graph vertex with an array of values 支持在 AWS Neptune 和 Azure Cosmos DB 中添加自定义 Gremlin DSL? - Support for adding custom Gremlin DSL in AWS Neptune and Azure Cosmos DB? 使用来自 Go 的 Gremlin 在 Neptune 中创建图形时出错 - Error creating graph in Neptune using Gremlin from Go Gremlin 联合查询需要很长时间才能针对 Neptune DB 执行 - Gremlin union query takes long time to execute against Neptune DB 我如何对 Gremlin/Neptune 进行条件排序 - How can I do conditional sort on Gremlin/Neptune 在添加或更新边缘时出现错误“未匿名生成 - 使用 __ class 而不是 TraversalSource”(在 Neptune 中使用 gremlin) - Getting Error "not spawned anonymously - use the __ class rather than a TraversalSource " while adding or updating the edge(in Neptune using gremlin) 在 IPython Notebook 中的 AWS Neptune ML 中检索或存储 Gremlin 查询的结果 - Retrieve or store results of Gremlin queries within AWS Neptune ML, in IPython Notebook Gremlin:AWS Neptune - 获取图中每个节点的所有叶节点作为 CSV - Gremlin : AWS Neptune - Get all Leaf Nodes for each Node in the Graph as CSV
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM