算法：使用联合查找计算岛数

Question

Suppose you need to count the number of islands on a matrix 假设您需要计算矩阵上的孤岛数量

                    {1, 1, 0, 0, 0},
                    {0, 1, 0, 0, 1},
                    {1, 0, 0, 1, 1},
                    {0, 0, 0, 0, 0},
                    {1, 0, 1, 0, 1}

We could simply use DFS or BFS when the input matrix size can be fitting into the memory. 当输入矩阵大小适合内存时，我们可以简单地使用DFS或BFS。

However, what do we do if the input matrix is really large which could not be fitting into the memory? 但是，如果输入矩阵很大而无法放入内存，该怎么办？

I could chunk/split the input matrix into different small files and read them respectively. 我可以将输入矩阵分块/拆分为不同的小文件，然后分别读取它们。

But how to merge them? 但是如何合并它们呢？

I got stuck at how to merge them. 我陷入了如何合并它们的困境。 I have the idea that when merging them we have to read some overlapped portion. 我的想法是，合并它们时，我们必须阅读一些重叠的部分。 But what is a concrete way to do so? 但是，这样做的具体方法是什么？

Trying to understand Matt's solution. 试图了解马特的解决方案。

When I drew the below sample on the whiteboard and process it row by row. 当我在白板上绘制以下示例并逐行处理时。 Merge left then merge top and it seems won't work. 合并然后合并顶部，似乎将无法正常工作。

From Matt's solution. 来自Matt的解决方案。

not sure what are topidx, botidx meaning 不确定什么是topidx，botidx的含义

            int topidx = col * 2;
            int botidx = topidx + 1;

Answer 1

Using union-find, the basic algorithm (without worrying about memory) is: 使用联合查找，基本算法（无需担心内存）是：

Create a set for every 1 每1创建一个集合
Merge the sets for every pair of adjacent 1 s. 合并每对相邻的1 s的集合。 It doesn't matter what order you find them in, so reading order is usually fine. 顺序是什么都没关系，因此阅读顺序通常很好。
Count the number of root sets -- there will be one for every island. 计算根集合的数量-每个岛都有一个。

Easy, and with a little care, you can do this using sequential access to the matrix and only 2 rows worth of memory: 简单，一点点地注意，就可以使用对矩阵的顺序访问和仅2行的内存来做到这一点：

Initialize the island count to 0 初始化孤岛计数为0
Read the first row, create a set for each 1 , and merge sets in adjacent columns. 阅读第一行，为每个1创建一个集合，然后合并相邻列中的集合。
For each additional row: 每增加一行：
1. Read the row, create a set for each 1 , and merge sets in adjacent columns; 读取行，为每个1创建一个集合，然后合并相邻列中的集合；
2. Merge sets in the new row with adjacent sets in the previous row. 合并新行中的集合与上一行中的相邻集合。 ALWAYS POINT THE LINKS DOWNWARD, so that you never end up with a set in the new row linked to a parent in the old row. 始终将链接指向下方，这样您就永远不会在新行中找到与旧行中的父级链接的集合。
3. Count the remaining root sets in the previous row, and add the number to your island count. 计算上一行中剩余的根集，然后将其添加到孤岛计数中。 These will never be able to merge with anything else. 这些将永远无法与其他任何东西合并。
4. Discard all the sets in the previous row -- you're never going to need them again, because you already counted them and nothing links to them. 丢弃上一行中的所有集合-您再也不需要它们了，因为您已经计算了它们，并且没有链接到它们。
Finally, count the root sets in the last row and add them to your island count. 最后，计算最后一行中的根集，并将其添加到您的孤岛计数中。

The key to this, of course, is always pointing the links downward whenever you link sets in different rows. 当然，关键是只要您在不同行中的链接集始终将链接指向下方。 This will not hurt the complexity of the algorithm, and if you're using your own union-find, then it is easy to accomplish. 这不会损害算法的复杂性，并且，如果您使用自己的联合查找，则很容易实现。 If you're using a library data structure then you can use it just for each row, and keep track of the links between root sets in different rows yourself. 如果您使用的是库数据结构，则可以仅对每一行使用它，并自己跟踪不同行中的根集之间的链接。

Since this is actually one of my favorite algorithms, here is an implementation in Java. 由于这实际上是我最喜欢的算法之一，因此这里是Java的实现。 This is not the most readable implementation since it involves some low-level tricks, but is super-efficient and short -- the kind of thing I'd write where performance is very important: 这不是最易读的实现，因为它涉及一些低级技巧，但效率极高且简短—我会在性能非常重要的情况下写这种东西：

import java.util.Arrays;

public class Islands
{
    private static final String[] matrix=new String[] {
        "  #############  ###   ",
        "  #      #####   ##    ",
        "  #  ##  ##   #   #    ",
        "    ###      ##   #  # ",
        "  #   #########  ## ## ",
        "          ##       ##  ",
        "          ##########   ",
    };

    // find with path compression.
    // If sets[s] < 0 then it is a link to ~sets[s].  Otherwise it is size of set
    static int find(int[] sets, int s)
    {
        int parent = ~sets[s];
        if (parent>=0)
        {
            int root = find(sets, parent);
            if (root != parent)
            {
                sets[s] = ~root;
            }
            return root;
        }
        return s;
    }

    // union-by-size
    // If sets[s] < 0 then it is a link to ~sets[s].  Otherwise it is size of set
    static boolean union(int[] sets, int x, int y)
    {
        x = find(sets,x);
        y = find(sets,y);
        if (x!=y)
        {
            if ((sets[x] < sets[y]))
            {
                sets[y] += sets[x];
                sets[x] = ~y;
            }
            else
            {
                sets[x] += sets[y];
                sets[y] = ~x;
            }
            return true;
        }
        return false;
    }

    // Count islands in matrix

    public static void main(String[] args)
    {
        // two rows of union-find sets.
        // top row is at even indexes, bottom row is at odd indexes.  This arrangemnt is chosen just
        // to make resizing this array easier.
        // For each value x:
        // x==0 => no set. x>0 => root set of size x. x<0 => link to ~x
        int cols=4;
        int[] setrows= new int[cols*2];

        int islandCount = 0;

        for (String s : matrix)
        {
            System.out.println(s);
            //Make sure our rows are big enough
            if (s.length() > cols)
            {
                cols=s.length();
                if (setrows.length < cols*2)
                {
                    int newlen = Math.max(cols,setrows.length)*2;
                    setrows = Arrays.copyOf(setrows, newlen);
                }
            }
            //Create sets for land in bottom row, merging left
            for (int col=0; col<s.length(); ++col)
            {
                if (!Character.isWhitespace(s.charAt(col)))
                {
                    int idx = col*2+1;
                    setrows[idx]=1; //set of size 1
                    if (idx>=2 && setrows[idx-2]!=0)
                    {
                        union(setrows, idx, idx-2);
                    }
                }
            }
            //merge up
            for (int col=0; col<cols; ++col)
            {
                int topidx = col*2;
                int botidx = topidx+1;
                if (setrows[topidx]!=0 && setrows[botidx]!=0)
                {
                    int toproot=find(setrows,topidx);
                    if ((toproot&1)!=0)
                    {
                        //top set is already linked down
                        union(setrows, toproot, botidx);
                    }
                    else
                    {
                        //link top root down.  It does not matter that we aren't counting its size, since
                        //we will shortly throw it aaway
                        setrows[toproot] = ~botidx;
                    }
                }
            }
            //count root sets, discard top row, and move bottom row up while fixing links
            for (int col=0; col<cols; ++col)
            {
                int topidx = col * 2;
                int botidx = topidx + 1;
                if (setrows[topidx]>0)
                {
                    ++islandCount;
                }
                int v = setrows[botidx];
                setrows[topidx] = (v>=0 ? v : v|1); //fix up link if necessary
                setrows[botidx] = 0;
            }
        }

        //count remaining root sets in top row
        for (int col=0; col<cols; ++col)
        {
            if (setrows[col*2]>0)
            {
                ++islandCount;
            }
        }

        System.out.println("\nThere are "+islandCount+" islands there");
    }

}

算法：使用联合查找计算岛数

问题描述

Trying to understand Matt's solution. 试图了解马特的解决方案。

not sure what are topidx, botidx meaning 不确定什么是topidx，botidx的含义

1 个解决方案

解决方案1
2 已采纳 2019-03-21 17:46:15

算法：使用联合查找计算岛数

问题描述

Trying to understand Matt's solution. 试图了解马特的解决方案。

not sure what are topidx, botidx meaning 不确定什么是topidx，botidx的含义

1 个解决方案

解决方案1 2 已采纳 2019-03-21 17:46:15

解决方案1
2 已采纳 2019-03-21 17:46:15