简体   繁体   English

识别矩阵中最大的连接组件

[英]Identifying largest connected component in a matrix

I have a python numpy matrix with 1´s and 0´s, I need to identify the largest "collection" of 1's in the matrix: http://imgur.com/4JPZufS 我有一个1和0的python numpy矩阵,我需要在矩阵中标识1的最大“集合”: http : //imgur.com/4JPZufS

The matrix can have up to 960.000 elements so I would like to avoid a brute-force solution. 矩阵最多可以包含960.000个元素,因此我想避免使用蛮力解决方案。

What is the smartest way to go about solving this problem? 解决这个问题的最明智的方法是什么?

You can use a data structure called disjoint-set ( here is a python implementation). 您可以使用称为Disjoint-set的数据结构( 是python实现)。 This data structure was designed for this kind of task. 该数据结构是为此类任务而设计的。

You iterate over the rows if the current element is 1, check if any of the already traversed neighbors are 1. If so add this element to its set. 如果当前元素为1,则遍历行。检查是否已遍历的邻居中的任何一个为1。如果是,则将此元素添加到其集合中。 If there are more than 1 union those sets. 如果存在多个联合,则将这些集合。 If no neighbors are 1 create a new set. 如果没有邻居,则创建一个新集合。 At the end output the largest set. 最后输出最大的集合。

This would work as follows: 这将如下工作:

def MakeSet(x):
  x.parent = x
  x.rank   = 0
  x.size = 1

def Union(x, y):
  xRoot = Find(x)
  yRoot = Find(y)
  if xRoot.rank > yRoot.rank:
    yRoot.parent = xRoot
  elif xRoot.rank < yRoot.rank:
    xRoot.parent = yRoot
  elif xRoot != yRoot: # Unless x and y are already in same set, merge them
    yRoot.parent = xRoot
    xRoot.rank = xRoot.rank + 1
  x.size += y.size
  y.size = x.size

def Find(x):
  if x.parent == x:
    return x
  else:
    x.parent = Find(x.parent)
    return x.parent

""""""""""""""""""""""""""""""""""""""""""

class Node:
  def __init__ (self, label):
    self.label = label
  def __str__(self):
    return self.label

rows = [[1, 0, 0], [1, 1, 0], [1, 0, 0]]
setDict = {}
for i, row in enumerate(rows):
  for j, val in enumerate(row):
    if row[j] == 0:
      continue
    node = Node((i, j))
    MakeSet(node)
    if i > 0:
      if rows[i-1][j] == 1:
        disjointSet = setDict[(i-1, j)]
        Union(disjointSet, node)
    if j > 0:
      if row[j-1] == 1:
      disjointSet = setDict[(i, j-1)]
      Union(disjointSet, node)
    setDict[(i, j)] = node
print max([l.size for l in setDict.values()])

>> 4

This is a full working example with code for disjoint set taken from the link above. 这是一个完整的工作示例,上面的链接提供了不相交集的代码。

I think the count will be off in the provided answer . 我认为在提供的答案中将不包括在内。 Eg if rows is changed to rows = [[1, 0, 0], [1, 1, 1], [1, 0, 0]] still getting 4, though it should be 5. Changing Union to 例如,如果将行更改为rows = [[1, 0, 0], [1, 1, 1], [1, 0, 0]]仍为4,尽管应该为5。

def Union(x, y):
  xRoot = Find(x)
  yRoot = Find(y)
  if xRoot.rank > yRoot.rank:
    yRoot.parent = xRoot
    xRoot.size += yRoot.size
  elif xRoot.rank < yRoot.rank:
    xRoot.parent = yRoot
    yRoot.size += xRoot.size
  elif xRoot != yRoot:  # Unless x and y are already in same set, merge them
    yRoot.parent = xRoot
    xRoot.rank = xRoot.rank + 1
    xRoot.size += yRoot.size

seems to fix. 似乎可以解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM