简体   繁体   English

将大的二进制文件(5GB)读入C#的字节数组中?

[英]Read a large binary file(5GB) into a byte array in C#?

I have a recording file (Binary file) more than 5 GB, i have to read that file and filter out the data needed to be send to server. 我有一个超过5 GB的记录文件(二进制文件),我必须读取该文件并过滤掉要发送到服务器的数据。

Problem is byte[] array supports till 2GB of file data . 问题是byte []数组支持直到2GB的文件数据。 so just need help if someone had already dealt with this type of situation. 因此,只要有人已经处理过这种情况,就只需要帮助。

using (FileStream str = File.OpenRead(textBox2.Text))
{
       int itemSectionStart = 0x00000000;
       BinaryReader breader = new BinaryReader(str);
       breader.BaseStream.Position = itemSectionStart;
       int length = (int)breader.BaseStream.Length;
       byte[] itemSection = breader.ReadBytes(length );  //first frame data
}

issues: 问题:

1: Length is crossing the range of integer.
2: tried using long and unint but byte[] only supports integer

Edit. 编辑。

Another approach i want to give try, Read data on frame buffer basis, suppose my frame buffer size is 24000 . 我想尝试的另一种方法是,基于帧缓冲区读取数据,假设我的帧缓冲区大小为24000。 so byte array store that many frames data and then process the frame data and then flush out the byte array and store another 24000 frame data. 因此字节数组存储了这么多的帧数据,然后处理该帧数据,然后刷新出字节数组并存储了另一个24000帧数据。 till keep on going till end of binary file.. 直到二进制文件的结尾。

As said in comments, I think you have to read your file with a stream. 如评论中所述,我认为您必须通过流读取文件。 Here is how you can do this: 这是您可以执行的操作:

int nbRead = 0;
var step = 10000;
byte[] buffer = new byte[step];
do
{
    nbRead = breader.Read(buffer, 0, step);
    hugeArray.Add(buffer);

    foreach(var oneByte in hugeArray.SelectMany(part => part))
    {
        // Here you can read byte by byte this subpart
    }
}
while (nbRead > 0);

If I well understand your needs, you are looking for a specific pattern into your file? 如果我很了解您的需求,您是否正在寻找文件中的特定模式?

I think you can do it by looking for the start of your pattern byte by byte. 我认为您可以通过逐字节查找模式的开始来做到这一点。 Once you find it, you can start reading the important bytes. 找到它之后,就可以开始读取重要的字节了。 If the whole important data is greater than 2GB, as said in the comments, you will have to send it to your server in several parts. 如注释中所述,如果整个重要数据大于2GB,则必须将其分几部分发送到服务器。

See you can not read that much big file at once, so you have to either split the file in small portions and then process the file. 看到您无法一次读取那么大的文件,因此您必须将文件分成小部分然后再处理。

 OR

Read file using buffer concept and once you are done with that buffer data then flush out that buffer. 使用缓冲区概念读取文件,并在使用完缓冲区数据后刷新该缓冲区。

I faced the same issue, so i tried the buffer based approach and it worked for me. 我遇到了同样的问题,所以我尝试了基于缓冲区的方法,并且对我有用。

         FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
           Buffer_value = 1024;
            byte[] Array_buffer = new byte[Buffer_value];
            while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
            {
               for (int z = 0; z < Array_buffer.Length; z = z + 4)
               {
                  string temp_id = BitConverter.ToString(Array_buffer, z, 4);
                  string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
                  string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
               }
            }

this way you can process your data. 这样您就可以处理数据。

For my case i was trying to store buffer read data in to a List, it will work fine till 2GB data after that it will throw memory exception. 就我而言,我试图将读取的缓冲区数据存储到一个列表中,它将正常工作直到2GB数据,之后它将引发内存异常。

The approach i followed, read the data from buffer and apply needed filters and write filter data in to a text file and then process that file. 我遵循的方法是从缓冲区读取数据并应用所需的过滤器,然后将过滤器数据写入文本文件,然后处理该文件。

//text file approach //文本文件方法

           FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
           Buffer_value = 1024;
            StreamWriter writer = new StreamWriter(Path, true);
            byte[] Array_buffer = new byte[Buffer_value];
            while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
            {
               for (int z = 0; z < Array_buffer.Length; z = z + 4)
               {
                 string temp_id = BitConverter.ToString(Array_buffer, z, 4);
                 string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
                 string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
                 if(temp_ArraydataID =="XYZ Condition")
                 { 
                     writer.WriteLine(temp_ArraydataID);
                 }
               }

            }
           writer.Close();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM