简体   繁体   中英

DSE Graph Index on Integer Interval

Currently, I have a graph stored through the DSE Graph Engine with 100K nodes. These nodes have label "customer" and a property called "age" which allows integer values. I have indexed this property with the following command:

schema.vertexLabel("customer").index("custByAge").secondary().by("age").add()

I would like to be able to use this index to answer queries that look for customers within a certain age range (eg "age" between 10 and 20). However, it doesn't seem like the index I created is actually being used when I query customers by an age interval.

When I submit the following query, a list of vertices is returned in about 40ms, which leads me to believe that the index is being used:

g.V().has('customer','age',15)

But when I submit the following query, the query times out after 30 sec (as I have specified in my configuration):

g.V().has('customer','age',inside(10,20))
Interruption of result iteration
Display stack trace? [yN]

This leads me to believe that the index is not being used for this query. Does that seem right? And if the index is not being used, does anyone have some advice for how I can speed up this query?

EDIT As suggested by an answer below, I have run .profile on each of the above queries, with the following results (only showing relevant info):

gremlin> g.V().has('customer','age',21).profile()
==>Traversal Metrics
...
  index-query                    14.333ms

gremlin> g.V().has('customer','age',inside(21,23)).profile()
==>Traversal Metrics
...
   index-query                    115.055ms
   index-query                    132.144ms
   index-query                    132.842ms
   >TOTAL                       53042.171ms

This leaves me with a few questions:

  1. Does the fact that .profile() returns index-query mean that indexes are being used for my second query?
  2. Why does the second query have 3 index queries, as opposed to 1 for the first query?
  3. All of the index queries combined, for the second query, total to about ~400ms. Why is the whole query taking ~50000ms? The .profile() command shows nothing else that takes time except for these index-queries, so where is the extra 50000ms coming from?

Are you using DataStax Studio? If so, you can use the .profile() feature to understand how the index is being engaged?

example .profile() use: gV().in().has('name','Julia Child').count().profile()

You want to use a search index for this case, it will be much much faster.

For example, in KillRVideo:

schema.vertexLabel("movie").index("search").search().by("year").add()

g.V().hasLabel('movie').has('year', gt(2000)).has('year', lte(2017)).profile()

Then from Studio profile() we can see:

SELECT "community_id", "member_id" FROM "killrvideo"."movie_p" WHERE 
"solr_query" = '{"q":"*:*", "fq":["year:{2000 TO *}","year:{* TO 
2017]"]}' LIMIT ?; with params (java.lang.Integer) 50000

By default, the profiler doesn't show the trace of all operations, so the index-query list you see may be truncated. Modify "max_profile_events" according to this documentation: https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/reference/schema/refSchemaConfig.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM