简体   繁体   English

C#巨大的2-Dim数组

[英]C# huge size 2-dim arrays

I need to declare square matrices in C# WinForms with more than 20000 items in a row. 我需要在C#WinForms中声明方形矩阵,连续超过20000个项目。 I read about 2GB .Net object size limit in 32bit and also the same case in 64bit OS. 我读到32位的2GB .Net对象大小限制以及64位操作系统中的相同大小写。 So as I understood the single answer - is using unsafe code or separate library built withing C++ compiler. 所以我理解单一答案 - 使用不安全的代码或使用C ++编译器构建的独立库。

The problem for me is worth because ushort[20000,20000] is smaller then 2GB but actually I cannot allocate even 700MB of memory. 对我来说问题是值得的,因为ushort [20000,20000]小于2GB但实际上我甚至无法分配700MB的内存。 My limit is 650MB and I don't understand why - I have 32bit WinXP with 3GB of memory. 我的限制是650MB,我不明白为什么 - 我有32位WinXP和3GB内存。 I tried to use Marshal.AllocHGlobal(700<<20) but it throws OutOfMemoryException, GC.GetTotalMemory returns 4.5MB before trying to allocate memory. 我尝试使用Marshal.AllocHGlobal(700 << 20)但它抛出OutOfMemoryException,GC.GetTotalMemory在尝试分配内存之前返回4.5MB。

I found only that many people say use unsafe code but I cannot find example of how to declare 2-dim array in heap (any stack can't keep so huge amount of data) and how to work with it using pointers. 我发现只有很多人说使用不安全的代码,但我找不到如何在堆中声明2-dim数组的示例(任何堆栈都不能保存如此大量的数据)以及如何使用指针处理它。 Is it pure C++ code inside of unsafe{} brackets? 它是不安全的{}括号内的纯C ++代码吗?

PS. PS。 Please don't ask WHY I need so huge arrays... but if you want - I need to analyze texts (for example books) and found lot of indexes. 请不要问为什么我需要如此庞大的数组...但如果你想 - 我需要分析文本(例如书籍)并找到很多索引。 So answer is - matrices of relations between words 所以答案是 - 词之间关系的矩阵

Edit: Could somebody please provide a small example of working with matrices using pointers in unsafe code . 编辑:有人可以提供一个使用不安全代码中的指针处理矩阵的小例子 I know that under 32bit it is impossible to allocate more space but I spent much time in googling such example and found NOTHING 我知道在32位以下不可能分配更多的空间,但我花了很多时间在谷歌搜索这样的例子,发现没什么

Why demand a huge 2-D array? 为什么要求巨大的二维阵列? You can simulate this with, for example, a jagged array - ushort[][] - almost as fast, and you won't hit the same single-object limit. 您可以使用例如锯齿状数组来模拟这个 - ushort[][] - 几乎同样快,并且您不会达到相同的单个对象限制。 You'll still need buckets-o-RAM of course, so x64 is implied... 你当然还需要buckets-o-RAM,所以暗示x64 ......

        ushort[][] arr = new ushort[size][];
        for(int i = 0 ; i < size ; i++) {
            arr[i] = new ushort[size];
        }

Besides which - you might want to look at sparse-arrays, eta-vectors, and all that jazz. 除此之外 - 您可能想要查看稀疏数组,eta向量和所有爵士乐。

The reason why you can't get near even the 2Gb allocation in 32 bit Windows is that arrays in the CLR are laid out in contiguous memory. 你甚至无法接近32位Windows中的2Gb分配的原因是CLR中的数组布局在连续的内存中。 In 32 bit Windows you have such a restricted address space that you'll find nothing like a 2Gb hole in the virtual address space of the process. 在32位Windows中,您拥有这样一个受限制的地址空间,您将在该进程的虚拟地址空间中找不到类似于2Gb的空间。 Your experiments suggest that the largest region of available address space is 650Mb. 您的实验表明,可用地址空间的最大区域为650Mb。 Moving to 64 bit Windows should at least allow you to use a full 2Gb allocation. 迁移到64位Windows至少应该允许您使用完整的2Gb分配。

Note that the virtual address space limitation on 32 bit Windows has nothing to do with the amount of physical memory you have in your computer, in your case 3Gb. 请注意,32位Windows上的虚拟地址空间限制与您的计算机中的物理内存量无关,在您的情况下为3Gb。 Instead the limitation is caused by the number of bits the CPU uses to address memory addresses. 相反,限制是由CPU用于寻址内存地址的位数引起的。 32 bit Windows uses, unsurprisingly, 32 bits to access each memory address which gives a total addressable memory space of 4Gbytes. 不出所料,32位Windows使用32位来访问每个内存地址,这使得总可寻址内存空间为4Gbytes。 By default Windows keeps 2Gb for itself and gives 2Gb to the currently running process, so you can see why the CLR will find nothing like a 2Gb allocation. 默认情况下,Windows为自己保留2Gb并为当前正在运行的进程提供2Gb,因此您可以看到为什么CLR不会发现2Gb分配。 With some trickery you can change the OS/user allocation so that Windows only keeps 1Gb for itself and gives the running process 3Gb which might help. 通过一些技巧,您可以更改操作系统/用户分配,以便Windows仅为自己保留1Gb并提供可能有帮助的运行进程3Gb。 However with 64 bit windows the addressable memory assigned to each process jumps up to 8 Terabytes so here the CLR will almost certainly be able to use full 2Gb allocations for arrays. 但是对于64位窗口,分配给每个进程的可寻址内存最多可跳跃到8TB,因此CLR几乎肯定能够为阵列使用完整的2Gb分配。

I'm so happy! 我很开心! :) Recently I played around subject problem - tried to resolve it using database but only found that this way is far to be perfect. :)最近我玩了主题问题 - 尝试使用数据库解决它但只发现这种方式是完美的。 Matrix [20000,20000] was implemented as single table. Matrix [20000,20000]实现为单表。 Even with properly set up indexes time required only to create more than 400 millions records is about 1 hour on my PC. 即使正确设置了索引,只需要创建超过4亿条记录的时间,我的电脑上大约需要1小时。 It is not critical for me. 这对我来说并不重要。 Then I ran algorithm to work with that matrix (require twice to join the same table!) and after it worked more than half an hour it made no even single step. 然后我运行算法来处理该矩阵(需要两次加入同一个表!)并且在它工作超过半小时后它甚至没有单步执行。 After that I understood that only way is to find a way to work with such matrix in memory only and back to C# again. 在那之后,我明白了唯一的方法就是找到一种方法,只在内存中使用这种矩阵,然后再回到C#。

I created pilot application to test memory allocation process and to determine where exactly allocation process stops using different structures. 我创建了试验应用程序来测试内存分配过程,并确定使用不同结构的确切分配过程停止的位置。

As was said in my first post it is possible to allocate using 2-dim arrays only about 650MB under 32bit WinXP. 正如我在第一篇文章中所说,在32位WinXP下,可以使用2-Dim阵列仅分配大约650MB Results after using Win7 and 64bit compilation also were sad - less than 700MB. 使用Win7和64位编译后的结果也很悲伤 - 不到700MB。

I used JAGGED ARRAYS [][] instead of single 2-dim array [,] and results you can see below: 我使用了JAGGED ARRAYS [] []而不是单个2-dim数组[,],结果你可以在下面看到:

Compiled in Release mode as 32bit app - WinXP 32bit 3GB phys. 在发布模式下编译为32位应用程序 - WinXP 32bit 3GB phys。 mem. MEM。 - 1.45GB Compiled in Release mode as 64bit app - Win7 64bit 2GB under VM - 7.5GB - 1.45GB在发布模式下编译为64位应用程序 - 在VM下为Win7 64位2GB - 7.5GB

--Sources of application which I used for testing are attached to this post. - 我用于测试的应用程序源附于此帖子。 I cannot find here how to attach source files so just describe design part and put here manual code. 我在这里找不到如何附加源文件,所以只需描述设计部分并放在这里手动代码。 Create WinForms application. 创建WinForms应用程序。 Put on form such contols with default names: 1 button, 1 numericUpDown and 1 listbox In .cs file add next code and run. 使用默认名称放置这样的控件:1个按钮,1个numericUpDown和1个列表框在.cs文件中添加下一个代码并运行。

private void button1_Click(object sender, EventArgs e)
        {
            //Log(string.Format("Memory used before collection: {0}", GC.GetTotalMemory(false)));
            GC.Collect();
            //Log(string.Format("Memory used after collection: {0}", GC.GetTotalMemory(true)));
            listBox1.Items.Clear();
            if (string.IsNullOrEmpty(numericUpDown1.Text )) {
                Log("Enter integer value");
            }else{
                int val = (int) numericUpDown1.Value;
                Log(TryAllocate(val));
            }
        }

        /// <summary>
        /// Memory Test method
        /// </summary>
        /// <param name="rowLen">in MB</param>
        private IEnumerable<string> TryAllocate(int rowLen) {
            var r = new List<string>();
            r.Add ( string.Format("Allocating using jagged array with overall size (MB) = {0}", ((long)rowLen*rowLen*Marshal.SizeOf(typeof(int))) >> 20) );
            try {
                var ar = new int[rowLen][];
                for (int i = 0; i < ar.Length; i++) {
                    try {
                        ar[i] = new int[rowLen];
                    }
                    catch (Exception e) {
                        r.Add ( string.Format("Unable to allocate memory on step {0}. Allocated {1} MB", i
                            , ((long)rowLen*i*Marshal.SizeOf(typeof(int))) >> 20 ));
                        break;
                    }
                }
                r.Add("Memory was successfully allocated");
            }
            catch (Exception e) {
                r.Add(e.Message + e.StackTrace);
            }
            return r;
        }

        #region Logging

        private void Log(string s) {
            listBox1.Items.Add(s);
        }

        private void Log(IEnumerable<string> s)
        {
            if (s != null) {
                foreach (var ss in s) {
                    listBox1.Items.Add ( ss );
                }
            }
        }

        #endregion

The problem is solved for me. 问题解决了我。 Guys, thank you in advance! 伙计们,提前谢谢你们!

如果稀疏数组不适用,可能最好只在C / C ++中使用与内存映射文件相关的平台API: http//en.wikipedia.org/wiki/Memory-mapped_file

If you explained what you are trying to do it would be easier to help. 如果你解释了你想要做什么,那将更容易提供帮助。 Maybe there are better ways than allocating such a huge amount of memory at once. 也许有比一次分配如此大量内存更好的方法。

Re-design is also choice number one in this great blog post: 在这篇伟大的博文中,重新设计也是第一选择:

BigArray, getting around the 2GB array size limit BigArray,绕过2GB数组大小限制

The options suggested in this article are: 本文中建议的选项是:

For the OutOfMemoryException read this thread (especially nobugz and Brian Rasmussen's answer): 对于OutOfMemoryException,请阅读此主题(尤其是nobugz和Brian Rasmussen的回答):
Microsoft Visual C# 2008 Reducing number of loaded dlls Microsoft Visual C#2008减少加载的dll数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM