简体   繁体   English

Gremlin - 在节点和边属性上查找具有多个 boolean 条件的连接节点

[英]Gremlin - finding connected nodes with several boolean conditions on both nodes and edges properties

I want to find nodes who should be linked to a given node, where the link is defined by some logic, which uses the nodes' and existing edges' attribute with the following logic:我想找到应该链接到给定节点的节点,其中链接由一些逻辑定义,该逻辑使用节点和现有边的属性,逻辑如下:

A) (The pair has the same zip (node attribute) and name_similarity (edge attribute) > 0.3 OR A)(该对具有相同的 zip(节点属性)和 name_similarity(边缘属性)> 0.3 或

B) The pair has a different zip and name_similarity > 0.5 OR B) 该对具有不同的 zip 并且 name_similarity > 0.5 或

C) The pair has an edge type "external_info" with value = "connect") C)该对具有边缘类型“external_info”,值为“connect”)

D) AND (the pair doesn't have an edge type with "external info" with value = "disconnect") D)AND(该对没有带有“external info”的边缘类型,值为“disconnect”)

In short: (A | B | C) & (~D)简而言之:(A | B | C) & (~D)

I'm still a newbie to gremlin, so I'm not sure how I can combine several conditions on edges and nodes.我仍然是 gremlin 的新手,所以我不确定如何在边和节点上组合多个条件。

Below is the code for creating the graph, as well as the expected results for that graph:以下是创建图表的代码,以及该图表的预期结果:

# creating nodes

(g.addV('person').property('name', 'A').property('zip', '123').
addV('person').property('name', 'B').property('zip', '123').
addV('person').property('name', 'C').property('zip', '456').
addV('person').property('name', 'D').property('zip', '456').
addV('person').property('name', 'E').property('zip', '123').
addV('person').property('name', 'F').property('zip', '999').iterate())

node1 = g.V().has('name', 'A').next()
node2 = g.V().has('name', 'B').next()
node3 = g.V().has('name', 'C').next()
node4 = g.V().has('name', 'D').next()
node5 = g.V().has('name', 'E').next()
node6 = g.V().has('name', 'F').next()

# creating name similarity edges

g.V(node1).addE('name_similarity').from_(node1).to(node2).property('score', 1).next() # over threshold
g.V(node1).addE('name_similarity').from_(node1).to(node3).property('score', 0.2).next() # under threshold
g.V(node1).addE('name_similarity').from_(node1).to(node4).property('score', 0.4).next() # over threshold
g.V(node1).addE('name_similarity').from_(node1).to(node5).property('score', 1).next() # over threshold
g.V(node1).addE('name_similarity').from_(node1).to(node6).property('score', 0).next() # under threshold

# creating external output edges

g.V(node1).addE('external_info').from_(node1).to(node5).property('decision', 'connect').next() 
g.V(node1).addE('external_info').from_(node1).to(node6).property('decision', 'disconnect').next() 

The expected output - for input node A - are nodes B (due to condition A), D (due to Condition B), and F (due to condition C).预期的 output - 对于输入节点 A - 是节点 B(由于条件 A)、D(由于条件 B)和 F(由于条件 C)。 node E should not be linked due to condition D.由于条件 D,节点 E 不应链接。

I'm looking for a Gremlin query that will retrieve these results.我正在寻找将检索这些结果的 Gremlin 查询。

Something seemed wrong in your data given the output you expected so I had to make corrections:鉴于您期望的 output,您的数据似乎有问题,因此我不得不进行更正:

  • Vertex D wouldn't appear in the results because "score" was less than 0.5顶点 D 不会出现在结果中,因为“分数”小于 0.5
  • "external_info" edges seemed reversed “external_info”边缘似乎颠倒了

Here's the data I used:这是我使用的数据:

g.addV('person').property('name', 'A').property('zip', '123').
addV('person').property('name', 'B').property('zip', '123').
addV('person').property('name', 'C').property('zip', '456').
addV('person').property('name', 'D').property('zip', '456').
addV('person').property('name', 'E').property('zip', '123').
addV('person').property('name', 'F').property('zip', '999').iterate()
node1 = g.V().has('name', 'A').next()
node2 = g.V().has('name', 'B').next()
node3 = g.V().has('name', 'C').next()
node4 = g.V().has('name', 'D').next()
node5 = g.V().has('name', 'E').next()
node6 = g.V().has('name', 'F').next()
g.V(node1).addE('name_similarity').from(node1).to(node2).property('score', 1).next() 
g.V(node1).addE('name_similarity').from(node1).to(node3).property('score', 0.2).next() 
g.V(node1).addE('name_similarity').from(node1).to(node4).property('score', 0.6).next() 
g.V(node1).addE('name_similarity').from(node1).to(node5).property('score', 1).next() 
g.V(node1).addE('name_similarity').from(node1).to(node6).property('score', 0).next() 
g.V(node1).addE('external_info').from(node1).to(node6).property('decision', 'connect').next() 
g.V(node1).addE('external_info').from(node1).to(node5).property('decision', 'disconnect').next() 

I went with the following approach:我采用了以下方法:

gremlin> g.V().has('person','name','A').as('a').
......1>   V().as('b').
......2>   where('a',neq('b')).
......3>   or(where('a',eq('b')).                                                    // A
......4>        by('zip').
......5>      bothE('name_similarity').has('score',gt(0.3)).otherV().where(eq('a')), 
......6>      bothE('name_similarity').has('score',gt(0.5)).otherV().where(eq('a')), // B
......7>      bothE('external_info').                                                // C
......8>        has('decision','connect').otherV().where(eq('a'))).
......9>   filter(__.not(bothE('external_info').                                     // D
.....10>                 has('decision','disconnect').otherV().where(eq('a')))).
.....11>   select('a','b').
.....12>    by('name')
==>[a:A,b:B]
==>[a:A,b:D]
==>[a:A,b:F]

I think this contains all the logic you were looking for, but I didn't spend a lot of time optimizing it as I don't think any optimization will get around the pain of the full graph scan of V().as('b') , so either your situation involves a relatively small graph (in-memory perhaps) and this query will work or you would need to find another method all together.我认为这包含您正在寻找的所有逻辑,但我没有花很多时间优化它,因为我认为任何优化都不会解决V().as('b')的全图扫描的痛苦V().as('b') ,所以要么你的情况涉及一个相对较小的图(也许在内存中),这个查询将起作用,或者你需要一起找到另一种方法。 Perhaps you have methods to further limit "b" which might help?也许您有进一步限制“b”的方法,这可能会有所帮助? If something along those lines is possible, I'd probably try to better define directionality of edge traversals to avoid bothE() and instead limit to outE() or inE() which would get rid of otherV() .如果沿着这些思路可行,我可能会尝试更好地定义边缘遍历的方向性以避免bothE()而是限制为outE()inE() ,这将摆脱otherV() Hopefully you use a graph that allows for vertex centric indices which would speed up those edge lookups on "score" as well (not sure if that would help much on "decision" as it has low selectivity).希望您使用允许以顶点为中心的索引的图形,这也可以加快“分数”上的边缘查找(不确定这是否会对“决策”有很大帮助,因为它的选择性很低)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM