简体   繁体   English

Ram中的C#2Gb文件是4gb。 为什么?

[英]C# 2Gb file is 4gb in Ram. Why?

Im reading in a file(this file consists of one long string which is 2gb in length). 我正在读取一个文件(该文件由一个长度为2GB的长字符串组成)。

This is my function which read all contents of the file into memory and then splits the string and places: *reader = StreamReader 这是我的函数,它将文件的所有内容读入内存,然后拆分字符串并放置:* reader = StreamReader

public List<char[]> GetAllContentAsList()
        {
            int bytesToRead = 1000000;
            char[] buffer = new char[bytesToRead];
            List<char[]> results = new List<char[]>();

            while (_reader.Read(buffer, 0, bytesToRead) != 0)
            {
                char[] temp = new char[bytesToRead];
                Array.Copy(buffer,temp,bytesToRead);
                results.Add(temp);
            }

            return results;
        }

When all data in placed into the List it takes up 4gb in RAM. 当所有数据放入List时,RAM占用4gb。 How is this possible when the file is only 2gb in size? 当文件大小只有2GB时,这怎么可能?

*Edit *编辑

This is what i ended up doing. 这就是我最终做的事情。 Im not converting the array of bytes to a string, im just passing the bytes on an manipulating them. 我没有将字节数组转换为字符串,我只是在操作它们时传递字节。 This was the fiel is only 2Gb in mem instead of 4gb 这个场景只有2Gb而不是4gb

 public List<byte[]> GetAllContentAsList()
            {
                int bytesToRead = 1000000;
                var buffer = new byte[bytesToRead];
                List<byte[]> results = new List<byte[]>();

                while (_reader.Read(buffer, 0, bytesToRead) != 0)
                {
                    //string temp = Encoding.UTF8.GetString(buffer);
                    byte[] b = new byte[bytesToRead];
                    Array.Copy(buffer,b,bytesToRead);
                    results.Add(b);
                }

                return results;
            }

Educated guess here: 在这里受过教育的猜测:

The file is UTF-8 or ASCII encoded and only (mostly) contains singly byte wide characters (or possibly some other codepage that is mostly single byte wide). 该文件是UTF-8ASCII编码,并且(大多数情况下)包含单字节宽字符(或者可能是一些主要是单字节宽的其他代码页)。

Now, the .NET characters are UTF-16 which are all 2 (or more) bytes in length. 现在,.NET字符是UTF-16 ,它们的长度都是2(或更多)字节。

So, in memory the characters will be double the size. 因此,在内存中,字符的大小将增加一倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM