简体   繁体   English

C#中链接的2D矩阵

[英]Linked 2D Matrix in C#

I need to implement this scenario in C#: 我需要在C#中实现这个场景:

http://i.stack.imgur.com/Dm6G3.jpg

The matrix will be very large, maybe 10000x10000 or larger. 矩阵将非常大,可能是10000x10000或更大。 I will use this for distance matrix in hierarchical clustering algorithm. 我将在分层聚类算法中将其用于距离矩阵。 In every iteration of the algorithm the matrix should be updated (joining 2 rows into 1 and 2 columns into 1). 在算法的每次迭代中,矩阵都应该更新(将2行连接成1和2列为1)。 If I use simple double[,] or double[][] matrix this operations will be very "expensive". 如果我使用简单的双[,]或双[] []矩阵,这个操作将非常“昂贵”。 Please, can anyone suggest C# implementation of this scenario? 请问,任何人都可以建议这种情况的C#实现吗?

Do you have a algorithm at the moment? 你现在有算法吗? And what do you mean by expensive? 那贵是什么意思? Memory or time expensive? 记忆还是时间昂贵? If memory expensive: There is not much you can do in c#. 如果内存昂贵:你可以用c#做多少。 But you can consider executing the calculation inside a database using temporary objects. 但您可以考虑使用临时对象在数据库中执行计算。 If time expensive: You can use parallelism to join columns and rows. 如果时间昂贵:您可以使用并行性来连接列和行。

But beside that I think a simple double[,] array is the fastest and memory sparing way you can get in c#, because accessing the array values is an o(1) operation and arrays have a least amount of memory and management overhead (compared to lists and dictionaries). 但除此之外,我认为一个简单的double[,]数组是你在c#中获得的最快和内存节省方式,因为访问数组值是一个o(1)操作,并且数组具有最少的内存和管理开销(比较列表和词典)。

As mentioned above, a basic double[,] is going to be the most effective way of handling this in C#. 如上所述,基本的double [,]将成为在C#中处理此问题的最有效方法。

Remember that C# sits of top of managed memory, and as such you have less fine grain control over low level (in terms of memory) operations in contrast to something like basic C. Creating your own objects in C# to add functionality will only use more memory in this scenario, and likely slow the algorithm down as well. 请记住,C#位于托管内存的顶层,因此与低级别(就内存而言)操作相比,您对细粒度的控制较少。与基本C相比,在C#中创建自己的对象只会使用更多在这种情况下的内存,也可能会降低算法速度。

If you have yet to pick an algorithm, CURE seems to be a good bet. 如果您尚未选择算法, CURE似乎是一个不错的选择。 The choice of algorithm may affect your data structure choice, but that's not likely. 算法的选择可能会影响您的数据结构选择,但这不太可能。

You will find that the algorithm determines the theoretical limits of 'cost' at any rate. 您会发现该算法无论如何都能确定“成本”的理论极限。 For example you will read that for CURE, you are bound by a O(n2 log n) running time, and O(n) memory use. 例如,您将阅读对于CURE,您受到O(n2 log n)运行时间和O(n)内存使用的约束。

I hope this helps. 我希望这有帮助。 If you can provide more detail, we might be able to assist further! 如果您能提供更多详细信息,我们可能会提供更多帮助!

N. N.

It's not possible to 'merge' two rows or two columns, you'd have to copy the whole matrix into a new, smaller one, which is indeed unacceptably expensive. 不可能“合并”两行或两列,你必须将整个矩阵复制到一个新的,较小的矩阵,这确实是不可接受的昂贵。

You should probably just add the values in one row to the previous and then ignore the values, acting like they where removed. 您可能只需将一行中的值添加到上一行,然后忽略这些值,就像删除它们一样。

the arrays of arrays: double[][] is actually faster than double[,]. 数组的数组:double [] []实际上比double [,]更快。 But takes more memory. 但需要更多的记忆。

The whole array merging thing might not be needed if you change the algoritm a bit, but this might help u: 如果你稍微更改算法,可能不需要整个数组合并的东西,但这可能有助于你:

    public static void MergeMatrix()
    {
        int size = 100;
        // Initialize the matrix
        double[,] matrix = new double[size, size];
        for (int i = 0; i < size; i++)
            for (int j = 0; j < size; j++)
                matrix[i, j] = ((double)i) + (j / 100.0);

        int rowMergeCount = 0, colMergeCount = 0;
        // Merge last row.
        for (int i = 0; i < size; i++)
            matrix[size - rowMergeCount - 2, i] += matrix[size - rowMergeCount - 1, i];
        rowMergeCount++;
        // Merge last column.
        for (int i = 0; i < size; i++)
            matrix[i, size - colMergeCount - 2] += matrix[i, size - colMergeCount - 1];
        colMergeCount++;

        // Read the newly merged values.
        int newWidth = size - rowMergeCount, newHeight = size - colMergeCount;
        double[,] smaller = new double[newWidth, newHeight];
        for (int i = 0; i < newWidth; i++)
            for (int j = 0; j < newHeight; j++)
                smaller[i, j] = matrix[i, j];

        List<int> rowsMerged = new List<int>(), colsMerged = new List<int>();
        // Merging row at random position.
        rowsMerged.Add(15);
        int target = rowsMerged[rowMergeCount - 1];
        int source = rowsMerged[rowMergeCount - 1] + 1;
        // Still using the original matrix since it's values are still usefull.
        for (int i = 0; i < size; i++)
            matrix[target, i] += matrix[source, i];
        rowMergeCount++;

        // Merging col at random position.
        colsMerged.Add(37);
        target = colsMerged[colMergeCount - 1];
        source = colsMerged[colMergeCount - 1] + 1;
        for (int i = 0; i < size; i++)
            matrix[i, target] += matrix[i, source];
        colMergeCount++;

        newWidth = size - rowMergeCount;
        newHeight = size - colMergeCount;
        smaller = new double[newWidth, newHeight];
        for (int i = 0, j = 0; i < newWidth && j < size; i++, j++)
        {
            for (int k = 0, m = 0; k < newHeight && m < size; k++, m++)
            {
                smaller[i, k] = matrix[j, m];
                Console.Write(matrix[j, m].ToString("00.00") + " ");

                // So merging columns is more expensive because we have to check for it more often while reading.
                if (colsMerged.Contains(m)) m++;
            }

            if (rowsMerged.Contains(j)) j++;
            Console.WriteLine();
        }

        Console.Read();
    }

In this code I use two 1D helper lists to calculate the index into a big array containing the data. 在这段代码中,我使用两个1D帮助器列表来计算包含数据的大数组的索引。 Deleting rows/columns is really cheap since I only need to remove that index from the helper-lists. 删除行/列非常便宜,因为我只需要从帮助列表中删除该索引。 But of course the memory in the big array remains, ie depending on your usage you have a memory-leak. 但当然大数组中的内存仍然存在,即根据您的使用情况,您会发生内存泄漏。

public class Matrix
{
    double[] data;
    List<int> cols;
    List<int> rows;

    private int GetIndex(int x,int y)
    {
        return rows[y]+cols[x];
    }

    public double this[int x,int y]
    {
        get{return data[GetIndex(x,y)];}
        set{data[GetIndex(x,y)]=value;} 
    }

    public void DeleteColumn(int x)
    {
        cols.RemoveAt(x);
    }

    public void DeleteRow(int y)
    {
        rows.RemoveAt(y);
    }

    public Matrix(int width,int height)
    {
        cols=new List<int>(Enumerable.Range(0,width));
        rows=new List<int>(Enumerable.Range(0,height).Select(i=>i*width));
        data=new double[width*height];
    }
}

Hm, to me this looks like a simple binary tree. 嗯,对我来说,这看起来像一个简单的二叉树。 The left node represents the next value in a row and the right node represents the column. 左侧节点表示行中的下一个值,右侧节点表示该列。

So it should be easy to iterate rows and columns and combine them. 因此,迭代行和列并将它们组合起来应该很容易。

Thank you for the answers. 谢谢你的答案。

At the moment I'm using this solution: 目前我正在使用这个解决方案:

public class NodeMatrix
{

    public NodeMatrix Right { get; set;}
    public NodeMatrix Left { get; set; }
    public NodeMatrix Up { get; set; }
    public NodeMatrix Down { get; set; }
    public int I  { get; set; }
    public int J  { get; set; }
    public double Data { get; set; }

    public NodeMatrix(int I, int J, double Data)
    {
        this.I = I;
        this.J = J;
        this.Data = Data;
    }
}

List<NodeMatrix> list = new List<NodeMatrix>(10000);

Then I'm building the connections between the nodes. 然后我正在构建节点之间的连接。 After that the matrix is ready. 之后矩阵准备就绪。

This will use more memory, but operations like adding rows and columns, joining rows and columns I think will be far more faster. 这将使用更多的内存,但我认为添加行和列,连接行和列等操作会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM