简体   繁体   English

如何在Java中构建加权有向无环图

[英]How to build in Java a Weighted Directed Acyclic Graph

I did a search on similar topics, but the answers are too vague for my level of understanding and comprehension, and I don't think they're specific enough to my question. 我在类似的主题上进行了搜索,但是对于我的理解和理解水平来说,答案太模糊了,我认为它们对我的问题不够具体。

Similar threads: 类似线程:
Tree (directed acyclic graph) implementation 树(有向无环图)实现
Representing a DAG (directed acyclic graph) 表示DAG(有向无环图)

Question: 题:

I have formatted a text file which contains data of the following format... 我已经格式化了一个文本文件,其中包含以下格式的数据...
Example dataset: 示例数据集:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622 GO:0000109#is_a:GO:0000110#is_a:GO:0000111#is_a:GO:0000112#is_a:GO:0000113#is_a:GO:0070312#is_a:GO:0070522#is_a:GO:0070912#is_a:GO: 0070913#is_a:GO:0071942#part_of:GO:0008622
GO:0000112#part_of: GO:0000442 GO:0000112#part_of:GO:0000442
GO:0000118#is_a: GO:0016581#is_a: GO:0034967#is_a: GO:0070210#is_a: GO:0070211#is_a: GO:0070822#is_a: GO:0070823#is_a: GO:0070824 GO:0000118#is_a:GO:0016581#is_a:GO:0034967#is_a:GO:0070210#is_a:GO:0070211#is_a:GO:0070822#is_a:GO:0070823#is_a:GO:0070824
GO:0000120#is_a: GO:0000500#is_a: GO:0005668#is_a: GO:0070860 GO:0000120#is_a:GO:0000500#is_a:GO:0005668#is_a:GO:0070860
GO:0000123#is_a: GO:0005671#is_a: GO:0043189#is_a: GO:0070461#is_a: GO:0070775#is_a: GO:0072487 GO:0000123#is_a:GO:0005671#is_a:GO:0043189#is_a:GO:0070461#is_a:GO:0070775#is_a:GO:0072487
GO:0000126#is_a: GO:0034732#is_a: GO:0034733 GO:0000126#is_a:GO:0034732#is_a:GO:0034733
GO:0000127#part_of: GO:0034734#part_of: GO:0034735 GO:0000127#part_of:GO:0034734#part_of:GO:0034735
GO:0000133#is_a: GO:0031560#is_a: GO:0031561#is_a: GO:0031562#is_a: GO:0031563#part_of: GO:0031500 GO:0000133#is_a:GO:0031560#is_a:GO:0031561#is_a:GO:0031562#is_a:GO:0031563#part_of:GO:0031500
GO:0000137#part_of: GO:0000136 GO:0000137#part_of:GO:0000136

I'm looking to construct a weighted directed DAG from this data (the above is just a snippet). 我正在根据此数据构建加权定向DAG(以上只是代码段)。 The whole dataset of 106kb is here: Source 整个106kb的数据集在这里: 来源

-------------------------------------------------- --------------------------------------------------

Taking into consideration line-by-line, the data of each line is explained as follows... 考虑到逐行,每行的数据解释如下...
First line as an example: 第一行为例:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622 GO:0000109#is_a:GO:0000110#is_a:GO:0000111#is_a:GO:0000112#is_a:GO:0000113#is_a:GO:0070312#is_a:GO:0070522#is_a:GO:0070912#is_a:GO: 0070913#is_a:GO:0071942#part_of:GO:0008622

'#' is the delimeter/tokenizer for the line data. “#”是行数据的分隔符/标记。
The First term, GO:0000109 is the node name. 第一项,GO:0000109是节点名称。
The subsequent terms of is_a: GO:xxxxxxx OR part_of: GO:xxxxxxx are the nodes which are connected to GO:0000109. is_a:GO:xxxxxxx或part_of:GO:xxxxxxx的后续术语是连接到GO:0000109的节点。
Some of the subsequent terms have connections too, as depicted in the dataset. 如数据集中所示,某些后续术语也具有连接。
When it is is_a, the weight of the edge is 0.8. 当is_a时,边缘的权重为0.8。
When it is part_of, the weight of the edge is 0.6. 当为part_of时,边缘的权重为0.6。

-------------------------------------------------- --------------------------------------------------

I have Google-d on how DAGs are, and I understand the concept. 我对DAG的使用方式有Google-d的知识,并且我了解这个概念。 However, I still have no idea how to put it into code. 但是,我仍然不知道如何将其放入代码中。 I'm using Java. 我正在使用Java。
From my understanding, a graph generally consists of nodes and arcs. 据我了解,图通常由节点和弧组成。 Does this graph require an adjacency list to determine the direction of the connection? 该图是否需要邻接表来确定连接方向? If so, I'm not sure how to combine the graph and adjacency list to communicate with each other. 如果是这样,我不确定如何将图形和邻接表结合在一起进行通信。 After constructing the graph, my secondary goal is to find out the degree of each node from the root node. 构建完图之后,我的第二个目标是从根节点中找出每个节点的度数。 There is a root node in the dataset. 数据集中有一个根节点。

For illustration, I have drawn out a sample of the connection of the first line of data below: 为了说明,我绘制了下面第一行数据的连接示例:
Image Link 图片链接

I hope you guys understand what I'm trying to achieve here. 我希望你们能理解我在这里想要实现的目标。 Thanks for looking through my problem. 感谢您浏览我的问题。 :) :)

Because it's easier to think about, I'd prefer to represent it as a tree. 因为考虑起来比较容易,所以我宁愿将它表示为一棵树。 (Also makes it easier to traverse the map and keep intermediate degrees.) (这也使遍历地图和保持中间度更加容易。)

You could have a Node class, which would have a Collection of child Node objects. 您可能有一个Node类,该类将具有子Node对象的集合。 If you must, you could also represent the child relationships as a Relationship object, which would have both a weight and a Node pointer, and you could store a Collection of Relationship objects. 如果需要,还可以将子关系表示为一个Relationship对象,该对象既具有权重又具有Node指针,并且可以存储“ Relationship对象的集合。

Then you could do a walk on the tree starting from the root, and mark each visited node with its degree. 然后,您可以从根开始在树上散步,并用度数标记每个访问的节点。

class Node{
    String name;
    List<Relationship> children;
}

class Relationship{
    Node child;
    double weight;
}

class Tree{
    Node root;
}

Here, Tree should probably have a method like this: 在这里, Tree可能应该具有这样的方法:

public Node findNodeByName(String name);

And Node should probably have a method like this: Node可能应该具有这样的方法:

public void addChild(Node n, double weight);

Then, as you parse each line, you call Tree.findNodeByName() to find the matching node (and create one if none exists... but that shouldn't happen, if your data is good), and append the subsequent items on the line to that node. 然后,在解析每一行时,您调用Tree.findNodeByName()来找到匹配的节点(如果不存在,则创建一个节点...但是如果您的数据很好,那不应该发生),然后在到该节点的线。

As you've pointed out, DAGs cannot really be converted to trees, especially because some nodes have multiple parents. 正如您所指出的,DAG不能真正转换为树,尤其是因为某些节点具有多个父节点。 What you can do is insert the same node as the child of multiple parents, perhaps using a hash table to decide if a particular node has been traversed or not. 可以做的是插入与多个父级的子级相同的节点,也许使用哈希表来确定是否遍历了特定节点。

Reading the comments, you seem confused by how a Node can contain Relationships which each in turn contains a Node. 阅读评论后,您似乎对节点如何包含关系(每个关系又包含一个节点)感到困惑。 This is quite a common strategy, it is in general called the Composite pattern. 这是相当普遍的策略,通常称为“复合模式”。

The idea in the case of trees is that the tree can be thought of as consisting of multiple subtrees - if you were to disconnect a node and all its ancestors from the tree, the disconnected nodes would still make a tree, though a smaller one. 就树而言,其想法是可以将树视为由多个子树组成-如果您要从树上断开节点及其所有祖先的连接,则断开连接的节点仍会构成一棵树,尽管它是较小的树。 Thus, a natural way to represent a tree is to have each Node contain other Nodes as children. 因此,代表一棵树的自然方法是让每个节点包含其他节点作为子节点。 This approach lets you do many things recursively, which in the case of trees is often, again, natural. 这种方法使您可以递归地执行许多操作,对于树木来说,这通常又是自然的。

Letting a Node keep track of its children and no other parts of the tree also emulates the mathematical directed graph - each vertex is "aware" only of its edges and nothing else. 让节点跟踪其子节点,树的其他部分也不会模仿数学有向图-每个顶点仅“知道”其边缘,而没有其他任何东西。

Example recursive tree implementation 递归树实现示例

For instance, to search for an element in a binary search tree, you would call the root's search method. 例如,要在二元搜索树中搜索元素,可以调用根的搜索方法。 The root then checks whether the sought element is equal, less or greater than itself. 然后,根检查所寻找的元素是否等于,小于或大于其自身。 If it is equal, the search exits with an appropriate return value. 如果相等,则搜索以适当的返回值退出。 If it is less or greater, the root would instead call search on the left or right child, respectively, and they would do exactly the same thing. 如果小于或大于,则根将分别调用左侧或右侧子级的搜索,它们将做完全相同的事情。

Analogously, to add a new Node to the tree, you would call the root's add method with the new node as a parameter. 类似地,要将新节点添加到树中,您将以新节点为参数调用根的add方法。 The root decides whether it should adopt the new node or pass it on to one of its children. 根决定是采用新节点还是将其传递给其子节点之一。 In the latter case, it would select a child and call its add method with the new Node as a parameter. 在后一种情况下,它将选择一个子节点,并使用新Node作为参数调用其add方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM