简体   繁体   English

DSE:查询超时/慢

[英]DSE: Query Timeout/Slow

I am currently running a cluster of 3 nodes with 200 mill of data and the specific vertex I'm querying a total of 25 mill vertex and 30 Mill edges. 我目前正在运行一个由3个节点组成的集群,其中包含200密尔的数据和特定的顶点,我正在查询总共25密尔的顶点和30密尔的边。 I am running the following query 我正在运行以下查询

gV().hasLabel('people_node').has("age", inside(0,25)).filter(outE('posted_question').count().is(gt(1))).profile()

I have tried this query on a smaller set of ~100 vertex and edges and the profiler showed that indexes have been used for all parts of the query. 我在约100个顶点和边的较小集合上尝试了此查询,分析器显示索引已用于查询的所有部分。 However, I think the problem might be in my schema which is shown below. 但是,我认为问题可能出在我的架构中,如下所示。

Schema 架构

schema.propertyKey('id').Text().ifNotExists().create()
schema.propertyKey('name').Text().ifNotExists().create()
schema.propertyKey('age').Int().ifNotExists().create()
schema.propertyKey('location').Point().withGeoBounds().ifNotExists().create()
schema.propertyKey('gender').Text().ifNotExists().create()
schema.propertyKey('dob').Timestamp().ifNotExists().create()

schema.propertyKey('tags').Text().ifNotExists().create()
schema.propertyKey('date_posted').Timestamp().ifNotExists().create()

schema.vertexLabel('people_node').properties('id','name','location','gender','dob').create()
schema.vertexLabel('questions_node').properties('id','tags','date_posted').create()
schema.edgeLabel('posted_question').single().connection('people_node','questions_node').create()

Indexes Used 使用的索引

schema.vertexLabel("people_node").index("search").search().by("name").by("age").by("gender").by("location").by("dob").ifNotExists().add()
schema.vertexLabel("people_node").index("people_node_index").materialized().by("id").ifNotExists().add()

schema.vertexLabel("questions_node").index("search").search().by("date_posted").by("tags").ifNotExists().add()
schema.vertexLabel("questions_node").index("questions_node_index").materialized().by("id").ifNotExists().add()

I have also read about "OLAP" queries I believe I have activated it but the query is still way too slow. 我还阅读了有关“ OLAP”查询的信息,我相信我已将其激活,但查询速度仍然太慢。 Any advise or insight on what is slowing it down will be greatly appreciated. 任何对降低速度的建议或见解将不胜感激。

Profile Statement (OLTP) 配置文件声明(OLTP)

gremlin> g1.V().has("people_node","age", inside(0,25)).filter(outE('posted_question').count().is(gt(1))).profile()
==>Traversal Metrics
Step                                                               Count  Traversers
     Time (ms)    % Dur
=============================================================================================================
DsegGraphStep(vertex,[],(age < 25 & age > 0 & l...                     1           1
        38.310    25.54
  query-optimizer
         0.219
    \_condition=((age < 25 & age > 0 & label = people_node) & (true))
  query-setup
         0.001
    \_isFitted=true
    \_isSorted=false
    \_isScan=false
  index-query
        26.581
    \_indexType=Search
    \_usesCache=false
    \_statement=SELECT "community_id", "member_id" FROM "MiniGraph"."people_node_p" WHERE "solr_query" = '{"q
                ":"*:*", "fq":["age:{0 TO 25}"]}' LIMIT ?; with params (java.lang.Integer) 50000
    \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
              al.empty, pagingState=null, pageSize=-1, user=Optional[cassandra], waitForSchemaAgreement=true,
               async=true}
TraversalFilterStep([DsegVertexStep(OUT,[posted...
       111.471    74.32
  DsegVertexStep(OUT,[posted_question],edge,(di...                     1           1
        42.814
    query-optimizer
         0.227
    \_condition=((direction = OUT & label = posted_question) & (true))
    query-setup
         0.036
    \_isFitted=true
    \_isSorted=false
    \_isScan=false
    vertex-query
        29.908
    \_usesCache=false
    \_statement=SELECT * FROM "MiniGraph"."people_node_e" WHERE "community_id" = ? AND "member_id" = ? AND "
                 ~~edge_label_id" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.Integer) 1300987392, (j
                 ava.lang.Long) 1026, (java.lang.Integer) 65584, (java.lang.Integer) 2
    \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Optio
               nal.empty, pagingState=null, pageSize=-1, user=Optional[cassandra], waitForSchemaAgreement=tru
               e, async=true}
    \_usesIndex=false
  RangeGlobalStep(0,2)                                                 1           1
         0.097
  CountGlobalStep                                                      1           1
         0.050
  IsStep(gt(1))
        68.209
DsegPropertyLoadStep
         0.205     0.14
                                            >TOTAL                     -           -
       149.986        -

Next, due to the partial query being much faster I assume the long time consumption is due to the necessary graph traversals. 接下来,由于部分查询要快得多,所以我认为长时间的消耗是由于必需的图遍历。 Hence, is it possible to cache or activate the indexes ( _usesIndex=false ) so that OLAP queries to be much faster? 因此,是否可以缓存或激活索引( _usesIndex=false ),以便使OLAP查询更快?

Will you please post the output of the .profile statement? 您能否发布.profile语句的输出?

Semanticaly, it looks like you're trying to find all "people" under the age of 25 that have more than 1 posted question. 语义学上,您似乎正在寻找所有25岁以下且有1个以上已发布问题的“人”。 Is that accurate? 准确吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM