简体   繁体   English

Titan cassandra不将定义的索引用于自定义gremlin步骤

[英]Titan cassandra does not use defined indexes for custom gremlin steps

We have defined 5 indexes using titan cassandra in the follow block of code 我们在以下代码块中使用titan cassandra定义了5个索引

 def mgmt = g.managementSystem;
 try {
     if (!mgmt.containsGraphIndex("byId")) {
         def key = mgmt.makePropertyKey('__id').dataType(String.class).make()
         mgmt.buildIndex("byId",Vertex.class).addKey(key).buildCompositeIndex()
     }
     if (!mgmt.containsGraphIndex("byType")) {
          def key = mgmt.makePropertyKey('__type').dataType(String.class).make()
         mgmt.buildIndex("byType",Vertex.class).addKey(key).buildCompositeIndex()
     }
     if (!mgmt.containsGraphIndex("lastName")) {
         def key = mgmt.makePropertyKey('lastName').dataType(String.class).make()
         mgmt.buildIndex('lastName',Vertex.class).addKey(key).buildMixedIndex(INDEX_NAME)
     }
     if (!mgmt.containsGraphIndex("firstName")) {
         def key = mgmt.makePropertyKey('firstName').dataType(String.class).make()
         mgmt.buildIndex('firstName',Vertex.class).addKey(key).buildMixedIndex(INDEX_NAME)
     }
     if (!mgmt.containsGraphIndex("vin")) {
         def key = mgmt.makePropertyKey('vin').dataType(String.class).make()
         mgmt.buildIndex('vin',Vertex.class).addKey(key).buildMixedIndex(INDEX_NAME)
     }
     mgmt.commit()
 } catch (Exception e) {
     System.err.println("An error occurred initializing indices")
     e.printStackTrace()
 }

we then execute the following query 然后我们执行以下查询

g.V.has('__id','49fb8bae5f994cf5825b849a5dd9b49a')

This produces a warning informing us that : 这会产生警告,通知我们:

"Query requires iterating over all vertices [{}]. For better performance, use indexes" “查询需要遍历所有顶点[{}]。为获得更好的性能,请使用索引”

I'm confused because according to the documentation these indexes are set up correctly, but for some reason titan is not using them. 我很困惑,因为根据文档,这些索引设置正确,但是由于某些原因,泰坦没有使用它们。

The indexes are created before any data is in the graph, so reindexing is not neccessary. 索引是在图形中的任何数据之前创建的,因此不需要重新索引。 Any help is greatly appreciated. 任何帮助是极大的赞赏。

Update- I've managed to break this down into a very simple test. 更新-我设法将其分解为一个非常简单的测试。 In our code we have developed a custom gremlin step to use for the stated query 在我们的代码中,我们开发了一个定制的gremlin步骤,用于所述查询

Gremlin.defineStep('hasId', [Vertex,Pipe], { String id ->
    _().has('__id', id)
})

then from our code we call 然后从我们的代码中调用

g.V.hasId(id)

It appears that when we use the custom gremlin step the query does not use the index, but when using the vanilla gremlin call the index is used. 看来,当我们使用自定义gremlin步骤时,查询不使用索引,但是当使用香草gremlin调用时,则使用索引。

It looks like a similar oddity was noted in this post https://groups.google.com/forum/#!topic/aureliusgraphs/6DqMG13_4EQ 这篇文章中似乎注意到了类似的怪异之处https://groups.google.com/forum/#!topic/aureliusgraphs/6DqMG13_4EQ

I would prefer to check for existence of the property key which would mean you adjust your checks to: 我希望检查属性键的存在,这意味着您将检查调整为:

if (!mgmt.containsRelationType("__id")) {

I tried out your code in the Titan Gremlin Console and I'm not seeing an issue: 我在Titan Gremlin控制台中试用了您的代码,但没有看到问题:

gremlin> g  = TitanFactory.open("conf/titan-cassandra.properties")
==>titangraph[cassandrathrift:[127.0.0.1]]
gremlin> mgmt = g.managementSystem
==>com.thinkaurelius.titan.graphdb.database.management.ManagementSystem@2227a6c1
gremlin> key = mgmt.makePropertyKey('__id').dataType(String.class).make()
==>__id
gremlin> mgmt.buildIndex("byId",Vertex.class).addKey(key).buildCompositeIndex()
==>com.thinkaurelius.titan.graphdb.database.management.TitanGraphIndexWrapper@6d4c273c
gremlin> mgmt.commit()
==>null
gremlin> mgmt = g.managementSystem
==>com.thinkaurelius.titan.graphdb.database.management.ManagementSystem@79d743e6
gremlin> mgmt.containsGraphIndex("byId")
==>true
gremlin> mgmt.rollback()
==>null
gremlin> v = g.addVertex()
==>v[256]
gremlin> v.setProperty("__id","123")
==>null
gremlin> g.commit()
==>null
gremlin> g.V
12:56:45 WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>v[256]
gremlin> g.V("__id","123")
==>v[256]
gremlin> g.V.has("__id","123")
==>v[256]

Note I'm not getting any ugly message about "...use indexes". 注意我没有收到有关“ ...使用索引”的任何丑陋消息。 Perhaps you can try my example here and see if that behaves as expected before going back to your code. 也许您可以在这里尝试我的示例,然后在返回代码之前查看其行为是否符合预期。

UPDATE: In answer to the updated question above with respect to the custom step. 更新:回答上述有关自定义步骤的更新问题。 As the post you found noted, Titan's query optimizer doesn't seem to be able to sort this one out. 正如您发现的帖子所述,Titan的查询优化器似乎无法解决这一问题。 I think it's easy to see why in this example: 我认为在此示例中很容易理解为什么:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> Gremlin.defineStep('hasName', [Vertex,Pipe], { n -> _().has('name',n) })
==>null
gremlin> g.V.hasName('marko')
==>v[1]
gremlin> g.V.hasName('marko').toString()
==>[GremlinStartPipe, GraphQueryPipe(vertex), [GremlinStartPipe, PropertyFilterPipe(name,EQUAL,marko)]]

The "compiled" Gremlin looks like that last line above. “已编译”的Gremlin看起来像上面的最后一行。 Note that custom step compiles to an "inner" pipe with a new GremlinStartPipe . 请注意,自定义步骤使用新的GremlinStartPipe编译为“内部”管道。 Compare that to the same without the custom step: 将其与没有自定义步骤的情况进行比较:

gremlin> g.V.has('name','marko').toString()
==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]

Titan can optimize the "GraphQueryPipe" with embedded has , but it seems that isn't the case with the custom step's signature. Titan可以使用嵌入的has优化“ GraphQueryPipe”,但是自定义步骤的签名似乎并非如此。 I think the workaround (at least for this particular scenario is write a function that returns the pipe. 我认为解决方法(至少对于此特定方案是编写一个返回管道的函数。

gremlin> def hasName(g,n){g.V.has('name',n)}  
==>true
gremlin> hasName(g,'marko')
==>v[1]
gremlin> hasName(g,'marko').toString()
==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]

Passing 'g' around kinda stinks. 传递“ g”有点臭。 Perhaps write your DSL so that 'g' gets wrapped in an class that then lets you do: 也许编写您的DSL,以便将“ g”包装在一个类中,然后您可以执行以下操作:

with(g).hasName('marko')

A final thought would be to use Groovy meta-programming facilities: 最后的想法是使用Groovy元编程工具:

gremlin> Graph.metaClass.hasName = { n -> delegate.V.has('name',n) }
==>groovysh_evaluate$_run_closure1@600b9d27
gremlin> g.hasName("marko").toString()                              
==>[GremlinStartPipe, GraphQueryPipe(has,vertex), IdentityPipe]
gremlin> g.hasName("marko")                                         
==>v[1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM