简体繁体 English

SciPy-从有向图导出的约束最小化

[英]SciPy - Constrained Minimization derived from a Directed Graph

原文 2014-02-19 22:28:33 6 2 python/ algorithm/ graph/ scipy/ mathematical-optimization

I'm looking for a solution to the following graph problem in order to perform graph analysis in Python. 我正在寻找以下图形问题的解决方案，以便在Python中执行图形分析。

Basically, I have a directed graph of N nodes where I know the following: 基本上，我有N个节点的有向图，我知道以下几点：

The sum of the weights of the out-edges for each node 每个节点的边缘权重之和
The sum of the weights of the in-edges for each node 每个节点的边缘权重之和
Following from the above, the sum of the sum across all nodes of the in-edges equals the sum of the sum of out-edges 根据以上所述，边缘内所有节点的总和等于边缘外总和
No nodes have edges with themselves 没有节点具有边缘
All weights are positive (or zero) 所有权重均为正（或为零）
However, I know nothing about to which nodes a given node might have an edge to, or what the weights of any edges are 不过，我什么都不知道到哪个节点给定节点可能有一个边缘，或任何边的权重是什么

Represented as a weighted adjacency matrix, I know the column sums and row sums but not the value of the edges themselves. 表示为加权邻接矩阵，我知道列总和和行总和，但不知道边缘本身的值。 I've realized that there is not a unique solution to this problem (Does anyone how to prove that, given the above, there is an assured solution?). 我已经意识到，没有唯一的解决方案可以解决这个问题（鉴于上述情况，没有人能证明这一点吗？）。 However, I'm hoping that I can at least arrive at a solution to this problem that minimizes the sum of the edge weights or maximizes the number of 0 edge weights or something along those lines (Basically, out of infinite choices, I'd like the most 'simple' graph). 但是，我希望至少能找到一个解决方案 ，使边缘权重的总和最小或使边缘权重为0或沿这些线的东西最大化（基本上，从无限选择中，我会像最“简单”的图表）。

I've thought about representing it as: 我已经考虑过将其表示为：

Min Sum(All Edge Weights) st for each node, the sum of its out-edge weights equals the known sum of these, and the sum of its in-edge weights equals the known sum of these. 每个节点的最小总和（所有边缘权重）st，其边缘权重的总和等于这些总和的已知总和，其边缘权重的总和等于这些总和的已知总和。 Additionally, constrained such that all weights are >= 0 此外，进行约束以使所有权重均> = 0

I'm primarily using this for data analysis in Scipy and Numpy. 我主要将其用于Scipy和Numpy中的数据分析。 However, using their constrained minimization techniques, I'll end up with approximately 2N^2-2N constraints from the edge-weight sum portion, and N constraints from the positive portion. 但是，使用它们的约束最小化技术，我将得到来自边缘权重和部分的大约2N ^ 2-2N约束，以及来自正部分的N个约束。 I'm worried this will be unfeasible for large data sets. 我担心这对于大数据集将不可行。 I could have up to 500 nodes. 我最多可以有500个节点。 Is this a feasible solution using SciPy's fmin_cobyla? 使用SciPy的fmin_cobyla，这是否可行？ Is there another way to layout this problem / another solver in Python that would be more efficient? 有没有另一种方法可以解决这个问题/在Python中使用另一个更有效的求解器？

Thanks so much! 非常感谢！ First post on StackOverflow. 关于StackOverflow的第一篇文章。

2 个解决方案

尽管不知道您的实际问题是什么，但情况听起来像是“ 分配问题” ，因此您应该查看匈牙利算法。

The prohibition against self-flows makes some instances of this problem infeasible (eg, one node that has in- and out-flows of 1). 禁止自流使该问题的某些实例不可行（例如，一个节点的流入和流出流量为1）。 Otherwise, a reasonably sparse solution with at most one self-flow always can be found as follows. 否则，总能找到一个稀疏的解决方案，该解决方案最多具有一个自流。 Initialize two queues, one for the nodes with positive out-flow from lowest ID to highest and one for the nodes with positive in-flow from highest ID to lowest. 初始化两个队列，一个用于从最低ID到最高的正流出的节点，另一个用于从最高ID到最低的正流入的节点的队列。 Add a flow from the front node of the first queue to the front node of the second, with quantity equal to the minimum of the out-flow of the former and the in-flow of the latter. 将流量从第一个队列的前端节点添加到第二个队列的前端节点，其数量等于前者的流出量和后者的流入量的最小值。 Update the out- and in-flows to their residual values and remove the exhausted node(s) from their queues. 将流出和流入更新为它们的残值，并从其队列中删除耗尽的节点。 Since the ID of the front of the first queue increases, and the ID of the front of the second queue decreases, the only node that self-flows is the one where the ID numbers cross. 由于第一个队列的前端的ID增加，而第二个队列的前端的ID减少，因此，自流的唯一节点是ID编号交叉的节点。

Minimizing the total flow is trivial; 使总流量最小化是微不足道的； it's constant. 它是恒定的。 Finding the sparsest solution is NP-hard; 找到最简单的解决方案是NP难的。 there's a reduction from subset sum where each of the elements being summed has a source node with that amount of out-flow, and two more sink nodes have in-flows, one of which is equal to the target sum. 子集总和减少了，其中被求和的每个元素都有一个具有该流出量的源节点，另外两个接收器节点都有流入，其中一个等于目标总和。 The subset sum instance is solvable if and only if no source flows to both sinks. 当且仅当没有源流向两个接收器时，子集和实例才可解。 The algorithm above is a 2-approximation. 上面的算法是2近似值。

To get rid of the self-flow on that one bad node sparsely: repeatedly grab a flow not involving the bad node and split it into two, via the bad node. 要稀疏地摆脱那个坏节点上的自流：反复获取不涉及坏节点的流，并通过坏节点将其分成两部分。 Stop when we exhaust the self-flow. 当我们耗尽自流时停下来。 This fails only if there are no flows left that don't use the bad node and there is still a self-flow, in which case the bad node has in- and out-flows that sum to more than the total flow, a necessary condition for the existence of a solution. 仅当没有剩余的不使用坏节点的流且仍然存在自流时，此操作才会失败，在这种情况下，坏节点的流入和流出总和大于总流量，这是必要的解决方案存在的条件。 This algorithm is a 4-approximation in sparsity. 该算法的稀疏度为4近似值。