简体   繁体   English

用于存储和处理大型(最多600k顶点)图形的Java库

[英]Java library for storing and processing large (up to 600k vertices) graphs

I'm working on a project which will involve running algorithms on large graphs. 我正在开发一个涉及在大图上运行算法的项目。 The largest two have around 300k and 600k vertices (fairly sparse I think). 最大的两个顶点有大约300k和600k顶点(我认为相当稀疏)。 I'm hoping to find a java library that can handle graphs that large, and also trees of a somewhat smaller size, as one of the algorithms I'll be using involves decomposing a graph into a tree. 我希望找到一个可以处理大型图形的Java库,以及尺寸稍小的树,因为我将使用的算法之一涉及将图形分解为树。 Ideally the library would also include breadth first search and Dijkstra's or other shortest-path algorithms. 理想情况下,该库还包括广度优先搜索和Dijkstra或其他最短路径算法。

Based on another question , I've been looking at a few libraries ( JGraphT , JUNG , jdsl , yworks ) but I'm having a hard time finding out how many vertices they can realistically handle. 基于另一个问题 ,我一直在寻找一些库( JGraphTJUNGjdslyworks ),但我很难找到他们能够真实处理的顶点数。 Looking at their documentation, all I could find was a bit in the JUNG FAQ that said it could easily handle graphs of upwards of 150k vertices, which is still quite a bit smaller than my graphs... I'm hoping someone here has used one or more of these libraries and can tell me if it'll handle the graph sizes I need, or if there's some other library that would be better. 看看他们的文档,我所能找到的只是在JUNG常见问题解答中说它可以轻松处理超过150k顶点的图形,这仍然比我的图形小一点......我希望有人在这里使用过这些库中的一个或多个可以告诉我它是否会处理我需要的图形大小,或者是否有一些其他库会更好。

For the record I don't need any visualization tools; 为了记录,我不需要任何可视化工具; this is strictly about representing the graphs and trees in data structures and running algorithms on them. 这完全是关于在数据结构中表示图形和树以及在它们上运行算法。

Background if anyone really cares: for a class I'm supposed to implement an algorithm described in a research paper, and run the experiments run in the paper as best I can. 背景,如果有人真正关心:对于一个课我应该实施研究论文中描述的算法,并尽可能地在论文中运行实验。 The paper and datasets I'll be using can be found here . 我将使用的论文和数据集可以在这里找到。 My professor says I can use any library I can find as long as I can tell what the time/space complexity of the algorithms/data structures are. 我的教授说我可以使用我能找到的任何库,只要我能分辨出算法/数据结构的时间/空间复杂性。

您应该看看Neo4J ,它是一个图形数据库,可能是您的问题的一个很好的解决方案。

Checkout JGraph as well. 结账JGraph也是如此。 However it is oriented towards visualization. 然而,它面向可视化。

Also, maybe Apache Hama - a distributed computing framework for massive scientific computations eg, matrix, graph and network algorithms. 此外,也许是Apache Hama--一种用于大规模科学计算的分布式计算框架,例如矩阵,图形和网络算法。

Annas may also interest you - open-source Java framework that was built for developers and researchers in the fields of Graph Theory - AI, Path finding, distributed systems, etc. Annas也可能对您感兴趣 - 开源Java框架是为图形理论领域的开发人员和研究人员构建的 - AI,路径查找,分布式系统等。

Cassovary https://github.com/twitter/cassovary -project from Twitter can handle very big graphs with Scala (thus JVM) in memory. 来自Twitter的Cassovary https://github.com/twitter/cassovary -project可以在内存中使用Scala(因此JVM)处理非常大的图形。

Alternatively, GraphChi's Java version can handle even bigger graphs, by using disk: http://code.google.com/p/graphchi-java/ 或者,GraphChi的Java版本可以使用磁盘处理更大的图形: http//code.google.com/p/graphchi-java/

However, GraphChi will not be efficient for exact shortest-path type algorithms, as they require fast random access. 但是,GraphChi对于精确的最短路径类型算法不会有效,因为它们需要快速随机访问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM