简体繁体 English

JAVA Graph / DFS实现

[英]JAVA Graph/DFS implementation

原文 2014-09-22 08:50:27 9 1 java/ performance/ graph/ depth-first-search

I have a small dilemma I would like to be advised with - 我有一个小小的困境，我想建议 -

I'm implementing a graph (directed) and I want to make it extra generic - that is Graph where T is the data in the the node(vertex). 我正在实现一个图形（定向），我想让它更通用 - 即Graph，其中T是节点（顶点）中的数据。 To add a vertex to the graph will be - add(T t). 要向图表添加顶点，请添加（T t）。 The graph will wrap T to a vertex that will hold T inside. 该图形将T包裹到一个将T保持在内部的顶点。

Next I would like to run DFS on the graph - now here comes my dilemma - Should I keep the "visited" mark in the vertex (as a member) or initiate some map while running the DFS (map of vertex -> status)? 接下来我想在图上运行DFS - 现在这就是我的困境 - 我应该在顶点（作为成员）保留“访问”标记还是在运行DFS（顶点 - >状态图）时启动一些地图？

Keeping it in the vertex is less generic (the vertex shouldn't be familiar with the DFS algo and implementation). 保持它在顶点不太通用（顶点不应该熟悉DFS算法和实现）。 But creating a map (vertex -> status) is very space consuming. 但是创建地图（顶点 - >状态）非常耗费空间。

What do you think? 你怎么看？

Thanks a lot! 非常感谢！

1 个解决方案

If you need to run algorithms, especially the more complex ones, you will quickly find that you will have to associate all kinds of data with your vertices. 如果您需要运行算法，尤其是更复杂的算法，您将很快发现必须将所有类型的数据与顶点相关联。 Having a generic way to store data with the graph items is a good idea and of course the access time for reading and writing that data should be O(1), ideally. 使用图形项存储数据的通用方法是一个好主意，当然，读取和写入数据的访问时间应该是O（1），理想情况下。 Simple implementations could be to use HashMap, which have O(1) acess time for most cases, but the factor is relatively high. 简单的实现可能是使用HashMap，它对大多数情况都有O（1）访问时间，但因子相对较高。

For the yFiles Graph Drawing Library they added a mechanism where the data is actually stored at the elements themselves, but you can allocate as many data slots as you like. 对于yFiles Graph Drawing Library，他们添加了一种机制，其中数据实际存储在元素本身，但您可以根据需要分配尽可能多的数据槽。 This is similar to managing an Object[] with each element and using the index into the data array as the "map". 这类似于使用每个元素管理Object[]并将索引作为“map”使用数据数组。 If your graph does not change, another strategy is to store the index of the elements in the graph with the elements themselves (just the integer) and then using that index to index into an array, where for each "data map" you have basically one array the size of the number of elements. 如果您的图形没有改变，另一种策略是使用元素本身（只是整数）存储图形中元素的索引，然后使用该索引索引到数组中，对于每个“数据映射”，您基本上具有一个数组元素数量的大小。 Both techniques scale very well and provide the best possible access times, unless your data is really sparse (only a fraction of the elements actually need to store the data). 这两种技术都可以很好地扩展并提供最佳的访问时间，除非您的数据非常稀疏（实际上只需要存储数据的一小部分元素）。

The " Object[] at Elements" approach: “元素Object[] ”方法：

In your vertex and edge class, add a field of type Object[] that is package private. 在您的顶点和边类中，添加一个Object[]类型的字段，它是包私有的。
Implement a Map interface that provides T getData(Vertex) and void setData(Vertex, T ) 实现一个Map接口，提供T getData(Vertex)和void setData(Vertex, T ）
One implementation of that interface could be backed by a HashMap<Vertex,T> but the one I was suggesting actually only stores an integer index that is used to index into the Object[] arrays at the vertices. 该接口的一个实现可以由HashMap<Vertex,T>但我建议的实际上只存储一个整数index ，用于索引顶点的Object[]数组。
In your graph class add a method createMap that keeps track of the used and free indices and creates a new instance of the above class whose getter and setter implementations use the package private field of the Vertex class to actually access the data 在你的graph类中添加一个方法createMap ，它跟踪used和free索引并创建上面类的新实例，其getter和setter实现使用Vertex类的包private字段来实际访问数据

The "One Array" approach: “One Array”方法：

Add a package private integer field to your Vertex class 将包私有整数字段添加到Vertex类
Keep the integer fields in sync with the order of the vertices in your graph - the first Vertex has index 0 , etc. 保持整数字段与图表中顶点的顺序同步 - 第一个Vertex的索引为0 ，等等。
In the alternative map implementation, you initially allocate one T[] that has the size of the number of vertices. 在替代地图实现中，您最初分配一个具有顶点数量大小的T[] 。
In the getter and setter implementations you take the index of the Vertex and use that to access the values in the array. 在getter和setter实现中，您获取Vertex的索引并使用它来访问数组中的值。

For the DFS algorithm I would choose the "one array"-approach as you could use a byte[] (or if "Visited" is all that is required you could even use a BitSet ) for space efficiency and you are likely to populate the data for all vertices in DFS if your graph is connected. 对于DFS算法，我会选择“一个数组” - 方法，因为你可以使用一个byte []（或者如果“访问”只需要你甚至可以使用BitSet ）来提高空间效率，你可能会填充如果图表已连接，则DFS中所有顶点的数据。 This should perform a lot better than a HashMap based approach and does not require the boxing and unboxing for storing the data in the Object[] . 这应该比基于HashMap的方法执行得更好，并且不需要装箱和拆箱来将数据存储在Object[] 。