简体   繁体   English

海王星 - 如何获得到所有具有比例权重的节点的距离 gremlin

[英]Neptune - How to get distance to all nodes with proportional weights gremlin

I'm having difficult time figuring out query in gremlin for the following scenario.我很难在 gremlin 中找出以下场景的查询。 Here is the the directed graph (may be cyclic).这是有向图(可能是循环的)。

在此处输入图片说明

I want to get top N favorable nodes, starting from node "Jane", where favor is defined as:我想获得前 N 个有利节点,从节点“Jane”开始,这里的优先级定义为:

favor(Jane->Lisa) = edge(Jane,Lisa) / total weight from outwards edges of Lisa
favor(Jane->Thomas) = favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)

favor(Jane->Jerryd) = favor(Jane->Thomas) * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)

favor(Jane->Jerryd) = [favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)] * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)


and so .. on

Here is same graph with hand calculation of what I mean,这是我的意思手工计算的相同图表,

在此处输入图片说明

This is fairly simple to transferse with programming but I'm not sure, how ecactly to query it with gremlin or even sparql.这对于通过编程进行传输相当简单,但我不确定使用 gremlin 甚至 sparql 查询它的效果如何。

Here is the query to create this example graph:这是创建此示例图的查询:

g
.addV('person').as('1').property(single, 'name', 'jane')
.addV('person').as('2').property(single, 'name', 'thomas')
.addV('person').as('3').property(single, 'name', 'lisa')
.addV('person').as('4').property(single, 'name', 'wyd')
.addV('person').as('5').property(single, 'name', 'jerryd')
.addE('favor').from('1').to('2').property('weight', 10)
.addE('favor').from('1').to('3').property('weight', 20)
.addE('favor').from('3').to('2').property('weight', 90)
.addE('favor').from('2').to('4').property('weight', 50)
.addE('favor').from('2').to('5').property('weight', 90)
.addE('favor').from('3').to('5').property('weight', 100)

All I'm looking for is:我正在寻找的是:

[Lisa, computedFavor]
[Thomas, computedFavor]
[Jerryd, computedFavor]
[Wyd, computedFavor]

I'm struggling to incooperate cyclic graph to adjust weight.我正在努力配合循环图来调整重量。 This is where I've been able to query so far: https://gremlify.com/f2r0zy03oxc/2到目前为止,这是我能够查询的地方: https : //gremlify.com/f2r0zy03oxc/2

g.V().has('name','jane').       // our starting node
   repeat(                      
      union(                    
         outE()                 // get only outwards edges
      ).
      otherV().simplePath()).   // produce simple path
   emit().  
   times(10).                   // max depth of 10
   path().                      // attain path
   by(valueMap())

Addressing Comments from stephen mallette:解决来自 stephen mallette 的评论:

favor(Jane->Jerryd) = 
    favor(Jane->Thomas) * favor(Thomas->Jerryd) 
  + favor(Jane->Lisa) * favor(Lisa->Jerryd)

// note we can expand on favor(Jane->Thomas) in above expression
// 
// favor(Jane->Thomas) is favor(Jane->Thomas)@directEdge +
//                        favor(Jane->Lisa) * favor(Lisa->Thomas)
//

Calculation Example计算示例

Jane to Lisa                   => 20/(10+20)         => 2/3
Lisa to Jerryd                 => 100/(100+90)       => 10/19
Jane to Lisa to Jerryd         => 2/3*(10/19)

Jane to Thomas (directly)      => 10/(10+20)         => 1/3
Jane to Lisa to Thomas         => 2/3 * 90/(100+90)  => 2/3 * 9/19
Jane to Thomas                 => 1/3 + (2/3 * 9/19)

Thomas to Jerryd               => 90/(90+50)         => 9/14
Jane to Thomas to Jerryd       => [1/3 + (2/3 * 9/19)] * (9/14)

Jane to Jerryd:
= Jane to Lisa to Jerryd + Jane to Thomas to Jerryd
= 2/3 * (10/19) + [1/3 + (2/3 * 9/19)] * (9/14)

Here is somewhat of psedocode:这是一些psedocode:

def get_favors(graph, label="jane", starting_favor=1):
  start = graph.findNode(label)
  queue = [(start, starting_favor)]
  favors = {}
  seen = set()
  
  while queue:
    node, curr_favor = queue.popleft()

    # get total weight (out edges) from this node
    total_favor = 0
    for (edgeW, outNode) in node.out_edges:
       total_favor = total_favor + edgeW

    for (edgeW, outNode) in node.out_edges:
    
       # if there are no favors for this node
       # take current favor and provide proportional favor
       if outNode not in favors:
          favors[outNode] = curr_favor * (edgeW / total_favor)

       # it already has some favor, so we add to it
       # we add proportional favor
       else:
          favors[outNode] += curr_favor * (edgeW / total_favor)

       # if we have seen this edge, and node ignore
       # otherwise, transverse
    
       if (edgeW, outNode) not in seen:
          seen.add((edgeW, outNode))
          queue.append((outNode, favors[outNode]))

   # sort favor by value and return top X
   return favors

Here is a Gremlin query that I believe applies your formula correctly.这是我认为正确应用您的公式的 Gremlin 查询。 I'll paste the full final query first then say a few words about the steps involved.我将首先粘贴完整的最终查询,然后就所涉及的步骤说几句。

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    sack().
.....12>    sum()     

==>0.768170426065163         

The query starts with Jane and keeps traversing until all paths to Jerry D have been inspected.查询从 Jane 开始并继续遍历,直到检查了所有到 Jerry D 的路径。 Along the way for each traverser a sack is maintained containing the calculated weight values for each relationship multiplied together.在每个遍历器的过程中,都会维护一个包含相乘的每个关系的计算权重值的sack The calculation on line 6 finds all the edge weight values possible coming from the prior vertex and the math step on line 7 is used to divide the weight on the current edge by that sum.第 6 行的计算找到所有可能来自先前顶点的边权重值,第 7 行的math步骤用于将当前边上的权重除以该总和。 At the very end each of the computed results is added together on line 12. If you remove the final sum step you can see the intermediate results.在最后,每个计算结果在第 12 行加在一起。如果删除最后的sum步骤,您可以看到中间结果。

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    sack()

==>0.2142857142857143
==>0.3508771929824561
==>0.2030075187969925   

To see the routes taken a path step can be added to the query.要查看所采用的路线,可以将path步骤添加到查询中。

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight'),
.....16>        sack()).fold()) 

==>[[jane,10,thomas,90,jerryd],0.2142857142857143]
==>[[jane,20,lisa,100,jerryd],0.3508771929824561]
==>[[jane,20,lisa,90,thomas,90,jerryd],0.2030075187969925]   

This approach also takes account of adding in any direct connections, per your formula as we can see if we use Thomas as the target.这种方法还考虑了根据您的公式添加任何直接连接,因为我们可以看到我们是否使用 Thomas 作为目标。

gremlin>  g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','thomas')).
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight'),
.....16>        sack()).fold())    

==>[[jane,10,thomas],0.3333333333333333]
==>[[jane,20,lisa,90,thomas],0.3157894736842105]  

These extra steps are not needed but having the path included is useful when debugging queries like this.这些额外的步骤不是必需的,但是在调试这样的查询时包含path很有用。 Also, and this is not necessary but perhaps just for general interest, I will add that you can also get to the final answer from here but the very first query I included is all you really need.此外,这不是必需的,但也许只是为了一般利益,我会补充说,您也可以从这里获得最终答案,但我包含的第一个查询就是您真正需要的。

g.withSack(1).V().
   has('name','jane').
   repeat(outE().
          sack(mult).
            by(project('w','f').
              by('weight').
              by(outV().outE().values('weight').sum()).
              math('w / f')).
          inV().
          simplePath()).
   until(has('name','thomas')).
   local(
     union(
       path().
         by('name').
         by('weight'),
       sack()).fold().tail(local)).  
    sum() 
  
==>0.6491228070175439  

If any of this is unclear or I have mis-understood the formula, please let me know.如果有任何不清楚或我误解了公式,请告诉我。

EDITED to add编辑添加

To find the results for all people reachable from Jane I had to modify the query a little bit.为了找到所有可以从 Jane 到达的人的结果,我不得不稍微修改一下查询。 The unfold at the end is just to make the results easier to read.最后unfold只是为了让结果更容易阅读。

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    emit().
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight').unfold(),
.....16>        sack()).fold()).
.....17>        group().
.....18>          by(tail(local,2).limit(local,1)).
.....19>          by(tail(local).sum()).
.....20>        unfold()

==>jerryd=0.768170426065163
==>wyd=0.23182957393483708
==>lisa=0.6666666666666666
==>thomas=0.6491228070175439    

The final group step on line 17 uses the path results to calculate the total favor for each unique name found.第 17 行的最后一个group步骤使用path结果来计算找到的每个唯一名称的总支持度。 To see the paths you can run the query with the group step removed.要查看路径,您可以在删除group步骤的情况下运行查询。

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    emit().
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight').unfold(),
.....16>        sack()).fold())

==>[jane,10,thomas,0.3333333333333333]
==>[jane,20,lisa,0.6666666666666666]
==>[jane,10,thomas,50,wyd,0.11904761904761904]
==>[jane,10,thomas,90,jerryd,0.2142857142857143]
==>[jane,20,lisa,90,thomas,0.3157894736842105]
==>[jane,20,lisa,100,jerryd,0.3508771929824561]
==>[jane,20,lisa,90,thomas,50,wyd,0.11278195488721804]
==>[jane,20,lisa,90,thomas,90,jerryd,0.2030075187969925]    

This answer is quite elegant and best for the environment involved with Neptune and Python.这个答案非常优雅,最适合与 Neptune 和 Python 相关的环境。 I offer a second for reference, in case others come across this question.我提供第二个参考,以防其他人遇到这个问题。 From the moment I saw this question I could only ever picture it as being solved with a VertexProgram in OLAP fashion with a GraphComputer .从我看到这个问题的那一刻起,我就只能将它想象为使用GraphComputer以 OLAP 方式使用GraphComputer解决的问题。 As a result, I had a hard time thinking of it any other way.结果,我很难以其他方式思考它。 Of course, use of a VertexProgram requires a JVM language like Java and will not work directly with Neptune.当然,使用VertexProgram需要像 Java 这样的 JVM 语言,并且不能直接与 Neptune 一起使用。 I suppose my closest workaround would have been to use Java, grab a subgraph() from Neptune and then run the custom VertexProgram in TinkerGraph locally which would be quite speedy to do.我想我最接近的解决方法是使用 Java,从 Neptune VertexProgram subgraph subgraph() ,然后在本地运行VertexProgram中的自定义VertexProgram ,这将非常快速。

More generally, without the Python/Neptune requirements, converting an algorithm to a VertexProgram is not a bad approach depending on the nature of the graph and the amount of data that needs to be traversed.更一般地说,在没有 Python/Neptune 要求的情况下,根据图的性质和需要遍历的数据量,将算法转换为VertexProgram并不是一个糟糕的方法。 As there isn't a lot of content out there on this topic I thought I'd offer the core of the code for it here.由于没有很多关于这个主题的内容,我想我会在这里提供它的核心代码。 This is the guts of it:这是它的胆量:

        @Override
        public void execute(final Vertex vertex, final Messenger<Double> messenger, final Memory memory) {
            // on the first pass calculate the "total favor" for all vertices
            // and pass the calculated current favor forward along incident edges
            // only for the "start vertex" 
            if (memory.isInitialIteration()) {
                copyHaltedTraversersFromMemory(vertex);

                final boolean startVertex = vertex.value("name").equals(nameOfStartVertrex);
                final double initialFavor = startVertex ? 1d : 0d;
                vertex.property(VertexProperty.Cardinality.single, FAVOR, initialFavor);
                vertex.property(VertexProperty.Cardinality.single, TOTAL_FAVOR,
                        IteratorUtils.stream(vertex.edges(Direction.OUT)).mapToDouble(e -> e.value("weight")).sum());

                if (startVertex) {
                    final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
                    memory.add(VOTE_TO_HALT, !incidents.hasNext());
                    while (incidents.hasNext()) {
                        final Edge incident = incidents.next();
                        messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
                                (double) incident.value("weight") /  (double) vertex.value(TOTAL_FAVOR));
                    }
                }
            } else {
                // on future passes, sum all the incoming "favor" and add it to
                // the "favor" property of each vertex. then once again pass the
                // current favor to incident edges. this will keep happening 
                // until the message passing stops.
                final Iterator<Double> messages = messenger.receiveMessages();
                final boolean hasMessages = messages.hasNext();
                if (hasMessages) {
                    double adjacentFavor = IteratorUtils.reduce(messages, 0.0d, Double::sum);
                    vertex.property(VertexProperty.Cardinality.single, FAVOR, (double) vertex.value(FAVOR) + adjacentFavor);

                    final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
                    memory.add(VOTE_TO_HALT, !incidents.hasNext());
                    while (incidents.hasNext()) {
                        final Edge incident = incidents.next();
                        messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
                                adjacentFavor * ((double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR)));
                    }
                }
            }
        }

The above is then executed as:然后上面的执行如下:

ComputerResult result = graph.compute().program(FavorVertexProgram.build().name("jane").create()).submit().get();
GraphTraversalSource rg = result.graph().traversal();
Traversal elements = rg.V().elementMap();

and that "elements" traversal yields:并且“元素”遍历产生:

{id=0, label=person, ^favor=1.0, name=jane, ^totalFavor=30.0}
{id=2, label=person, ^favor=0.6491228070175439, name=thomas, ^totalFavor=140.0}
{id=4, label=person, ^favor=0.6666666666666666, name=lisa, ^totalFavor=190.0}
{id=6, label=person, ^favor=0.23182957393483708, name=wyd, ^totalFavor=0.0}
{id=8, label=person, ^favor=0.768170426065163, name=jerryd, ^totalFavor=0.0}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Gremlin:AWS Neptune - 获取图中每个节点的所有叶节点作为 CSV - Gremlin : AWS Neptune - Get all Leaf Nodes for each Node in the Graph as CSV Neptune上的Gremlin:如何从“ select”返回的边缘获取“ outV” - Gremlin on Neptune: how to get “outV” from an edge that is returned from “select” 如何使用 Gremlin 查询将所有节点作为分层树? - How to get all the nodes as a hierarchical tree with a Gremlin query? 使用 Gremlin (AWS Neptune),如何从具有特定条件的起始节点遍历边缘获取长度为 n 的所有路径? - Using Gremlin (AWS Neptune), how can I get all paths of length n from a starting node traversing edges with specific criteria? Gremlin/Neptune:如何从 0 到列表末尾? - Gremlin/Neptune: How to range from 0 to end of list? Gremlin with Neptune:如何处理 Vertex ID 更新? - Gremlin with Neptune: How to handle Vertex ID updates? 如何在 Neptune 中使用 Gremlin 会话? - How do I use Gremlin sessions in Neptune? Neptune Gremlin - 如何通过 where count &gt; 2 编写 group - Neptune Gremlin - how to write group by where count > 2 如何在 AWS Lambda 中访问 Neptune DB (Gremlin)? - How to access Neptune DB (Gremlin) in AWS Lambda? Gremlin:按客户端提供的权重对节点进行排序 - Gremlin: sorting nodes by weights provided by the client
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM