简体   繁体   English

有向图中的最大公共子图

[英]Maximum Common Subgraph in a Directed Graph

I am trying to represent a group of sentences as a directed graph where one word is represented by one node.我试图将一组句子表示为一个有向图,其中一个单词由一个节点表示。 If a word is repeated then the node is not repeated, the previously existing node is used.如果单词重复,则节点不重复,则使用先前存在的节点。 Let's call this graph MainG .我们将此图MainG

Following this, I take a new sentence, creating a directed graph of this sentence (call this graph SubG ) and then looking for the Maximum Common Subgraph of SubG in MainG .在此之后,我取一个新句子,创建该句子的有向图(称为图SubG ),然后在SubGMainG的最大公共子图。

I am using NetworkX api in Python 3.5.我在 Python 3.5 中使用 NetworkX api。 I understand that as this is NP-Complete problem for normal graphs, but for Directed Graphs it is a Linear problem.我知道这是正常图的 NP-Complete 问题,但对于有向图则是线性问题。 One of the links I referred:我提到的链接之一:

How can I find Maximum Common Subgraph of two graphs? 如何找到两个图的最大公共子图?

I tried to do the following code:我尝试执行以下代码:

import networkx as nx
import pandas as pd
import nltk

class GraphTraversal:
    def createGraph(self, sentences):
        DG=nx.DiGraph()
        tokens = nltk.word_tokenize(sentences)
        token_count = len(tokens)
        for i in range(token_count):
            if i == 0:
                continue
            DG.add_edges_from([(tokens[i-1], tokens[i])], weight=1)
        return DG


    def getMCS(self, G_source, G_new):
        """
        Creator: Bonson
        Return the MCS of the G_new graph that is present 
        in the G_source graph
        """
        order =  nx.topological_sort(G_new)
        print("##### topological sort #####")
        print(order)

        objSubGraph = nx.DiGraph()

        for i in range(len(order)-1):

            if G_source.nodes().__contains__(order[i]) and G_source.nodes().__contains__(order[i+1]):
                print("Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
                objSubGraph.add_node(order[i])
                objSubGraph.add_node(order[i+1])
                objSubGraph.add_edge(order[i], order[i+1])
            else:
                print("Does Not Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
                continue


obj_graph_traversal = GraphTraversal()
SourceSentences = "A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ."
SourceGraph = obj_graph_traversal.createGraph(SourceSentences)

TestSentence_1 = "not much of a story"    #ThisWorks
TestSentence_1 = "not much of a story of what is good"    #This DOES NOT Work
TestGraph = obj_graph_traversal.createGraph(TestSentence_1)

obj_graph_traversal.getMCS(SourceGraph, TestGraph)

As I am trying to do a topological sort, the second one doesn't work.当我尝试进行拓扑排序时,第二个不起作用。

Would be interested in understanding the possible approaches to this.有兴趣了解可能的方法。

The following code gets the maximum common subgraph from a directed graph:以下代码从有向图中获取最大公共子图:

def getMCS(self, G_source, G_new):
    matching_graph=nx.Graph()

    for n1,n2,attr in G_new.edges(data=True):
        if G_source.has_edge(n1,n2) :
            matching_graph.add_edge(n1,n2,weight=1)

    graphs = list(nx.connected_component_subgraphs(matching_graph))

    mcs_length = 0
    mcs_graph = nx.Graph()
    for i, graph in enumerate(graphs):

        if len(graph.nodes()) > mcs_length:
            mcs_length = len(graph.nodes())
            mcs_graph = graph

    return mcs_graph

The edit queue for Bonson's answer is full, but it doesn't work with networkx 2.4 anymore and has some possible improvements: Bonson 答案的编辑队列已满,但它不再适用于 networkx 2.4,并且有一些可能的改进:

  • connected_component_subgraphs was removed in networkx 2.4 and connected_components which returns a set of nodes should be used instead. connected_component_subgraphs在networkx 2.4并除去connected_components它返回一组节点应使用。

  • because only the number of nodes is to find the largest component this can be simplified significantly.因为只有节点数才能找到最大的组件,这可以显着简化。

  • this isn't specifically tailored to the initial question anymore, because this is the best hit if searching for "Maximum Common Subgraph in a Directed Graph" which I needed for something completely different这不再是专门针对最初的问题量身定制的,因为如果搜索“有向图中的最大公共子图”,这是我需要的完全不同的东西的最佳选择

My adapted version is:我的改编版本是:

def getMCS(g1, g2):
    matching_graph=networkx.Graph()

    for n1,n2 in g2.edges():
        if g1.has_edge(n1, n2):
            matching_graph.add_edge(n1, n2)

    components = networkx.connected_components(matching_graph)

    largest_component = max(components, key=len)
    return networkx.induced_subgraph(matching_graph, largest_component)

If the last line is replaced with return networkx.induced_subgraph(g1, largest_component) it should also work correctly and return a directed graph.如果最后一行被替换为return networkx.induced_subgraph(g1, largest_component)它也应该正常工作并返回一个有向图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM