简体   繁体   English

图形分区算法与Neo4j图数据库

[英]Graph partition algo with Neo4j graph database

I know there has some famous graph partition algo tools like METIS which is implemented by karypis Lab ( http://glaros.dtc.umn.edu/gkhome/metis/metis/overview ) 我知道有一些着名的图形分区算法工具,如METIS,由karypis Lab实施( http://glaros.dtc.umn.edu/gkhome/metis/metis/overview

but I wanna know is there any method to partition graph stored in Neo4j? 但我想知道是否有任何方法来分割存储在Neo4j中的图形? or I have to dump the Neo4j's data and transform the node and edge format manually to fit the METIS input format? 或者我必须转储Neo4j的数据并手动转换节点和边缘格式以适应METIS输入格式?

Regarding new-ish and interesting algorithms, this is by no means exhaustive or state of the art, but these are the first places I would look: 关于新的和有趣的算法,这绝不是详尽的或现有的,但这些是我看的第一个地方:

Specific Algorithm : DiDiC (Distributed Diffusive Clustering) - I used it once in my thesis ( Partitioning Graph Databases ) 特定算法DiDiC(分布式扩散聚类) - 我在论文中使用过一次( 分区图数据库

  • You iterate over all nodes, then for each node retrieve all neighbors, in order to spread some of "some unit" to all your neighbors 迭代所有节点,然后为每个节点检索所有邻居,以便将一些“某个单元”传播给所有邻居
  • Easy to implement. 易于实施。
  • Can be made deterministic 可以做出确定性的
  • Iterative - as it's based on iterations (like Super Steps in Pregel) you can stop it at any time. 迭代 - 因为它基于迭代(如Pregel中的Super Steps),您可以随时停止它。 The longer you leave it the better the result, in theory (though in some cases, on certain graph shapes it can be unstable) 从理论上讲,离开它的时间越长,结果越好(尽管在某些情况下,在某些图形上它可能不稳定)
  • When we implemented this we ran it for 100 iterations on a machine with ~30GB RAM, for up to ~4 million nodes - it took no more than two days to complete. 当我们实现这一点时,我们在具有~30GB RAM的机器上运行了100次迭代,最多可达400万个节点 - 完成时间不超过两天。

Specific Algorithm : EvoCut "Finding sparse cuts locally using evolving sets" - local probabilistic algorithm from Microsoft - related to these papers 特定算法EvoCut“使用演化集在本地查找稀疏剪切” - 来自Microsoft的本地概率算法 - 与这些论文相关

  • Difficult to implement 难以实施
  • Local algorithm - BFS-like access patterns (random walks) 本地算法 - 类似BFS的访问模式(随机漫游)
  • It's been a while since i read that paper, but i remember it was built on clean abstractions: 我阅读那篇论文已经有一段时间了,但我记得它建立在干净的抽象基础之上:
    • EvoNibble (pluggable - decides how much of neighborhood to add to the current cluster EvoNibble(可插拔 - 决定要添加到当前群集的邻域数量
    • EvoCut (calls EvoNibble multiple times to find the local cluster) EvoCut(多次调用EvoNibble以查找本地群集)
    • EvoPartition (calls EvoCut repeatedly to partition entire graph) EvoPartition(反复调用EvoCut来划分整个图形)
  • Not deterministic 不确定

General Algorithm Family : Hierarchical Graph Clustering 通用算法族分层图聚类

From a high level: 从高层次:

  • Coarsen the graph by collapsing nodes into aggregate nodes 通过将节点折叠为聚合节点来粗化图形
    • coarsening strategy is selectable 粗化策略是可选择的
  • Find clusters in the coarsened/smaller graph 在粗化/小图中查找聚类
    • clustering strategy is selectable 聚类策略是可选择的
  • Incrementally decoarsen the graph, refining at the clustering at each step 逐步修饰图形,在每一步的聚类处进行细化
    • refining strategy is selectable 精炼策略是可选择的

Notes: 笔记:

  • If the graph changes slowly (or results don't need to be right up to date) it may be possible to coarsen once (or infrequently) then work with the coarsened graph - to save computation 如果图形变化缓慢(或结果不需要更新),可能会粗化一次(或不经常)然后使用粗化图形 - 以节省计算
  • I don't know of a specific algorithm to recommend 我不知道推荐的具体算法

General limitations - the things few clustering algorithms do: 一般限制 - 几乎没有聚类算法的事情:

  • Node types not acknowledged - ie, all nodes treated equally 节点类型未确认 - 即,所有节点均等处理
  • Relationship types not acknowledged - ie, all relationships treated equally 关系类型未得到承认 - 即所有关系均得到平等对待
  • Relationship direction not acknowledged - ie, relationships treated as undirected 关系方向未被承认 - 即被视为无向的关系

Having worked independently with METIS and Neo4j in the past, I am not aware of any tool for generating a METIS file from Neo4j. 在过去与METIS和Neo4j独立工作后,我不知道有任何工具可以从Neo4j生成METIS文件。 That being said, writing such a tool should be an easy task and would be a great community contribution. 话虽这么说,编写这样一个工具应该是一项容易的任务,并且将是一个很好的社区贡献。

Another approach for integrating METIS with Neo4j might be in connecting METIS to Neo4j from C++ via JNI. 将METIS与Neo4j集成的另一种方法可能是通过JNI将METIS从C ++连接到Neo4j。 However this is going to be much more involved as it would have to take care of things like transactions, concurrency etc. 然而,这将涉及更多,因为它必须处理交易,并发等事情。

On the more general question of partitioning graphs, it is quite possible to implement some of the more known and simple algorithms with reasonable effort. 关于划分图的更一般的问题,很有可能通过合理的努力实现一些更为人熟知和简单的算法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM