简体   繁体   English

将文件分成大块

[英]Splitting a file into chunks

I'm trying to split large files (3gb+) into chunks of 100mb, then sending those chunks through HTTP. 我正在尝试将大文件(3gb +)分成100mb的块,然后通过HTTP发送这些块。 For testing, i'm working on a 29 mb file, size: 30380892, size on disk: 30384128 (so there is no use of a 100mb limit condition at the moment). 为了进行测试,我正在处理一个29 mb的文件,大小:30380892,磁盘上的大小:30384128(因此目前没有使用100mb的限制条件)。

This is my code: 这是我的代码:

List<byte[]> bufferList = new List<byte[]>();
byte[] buffer = new byte[4096];
FileInfo fileInfo = new FileInfo(file);
long length = fileInfo.Length;
int nameCount = 0;
long sum = 0;
long count = 0;

using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{    
    while (count < length)
    {
        sum = fs.Read(buffer, 0, buffer.Length);
        count += sum;

        bufferList.Add(buffer);
    }

    var output2 = new byte[bufferList.Sum(arr => arr.Length)];
    int writeIdx2 = 0;
    foreach (var byteArr in bufferList)
    {
        byteArr.CopyTo(output2, writeIdx2);
        writeIdx2 += byteArr.Length;
    }

    HttpUploadBytes(url, output2, ++nameCount + fileName, contentType, path);
}

In this testing code, i'm adding each buffer I read into a list, when finished reading i'm combining the buffer array into one complete array. 在此测试代码中,我将读取的每个缓冲区添加到列表中,读完后将缓冲区数组组合为一个完整的数组。 The problem is, the result I get (output2 size) is 30384128 (as size on disk), so the file that get received in the server is corrupted. 问题是,我得到的结果(output2大小)是30384128(作为磁盘上的大小),因此在服务器中收到的文件已损坏。

What am I doing wrong? 我究竟做错了什么?

The problem is that you keep adding the same buffer of size 4KB to bufferList . 问题是您一直将相同大小为4KB的buffer添加到bufferList That's why the size of the file you receive matches the size on disk (it happens to be rounded to the nearest 4KB in your case). 这就是为什么您收到的文件大小与磁盘上的大小匹配的原因(在您的情况下,它恰好四舍五入到最接近的4KB)。

A bigger problem with your code is that the data you send is wrong, because you keep overwriting the data in the buffer . 代码的一个更大的问题是,您发送的数据是错误的,因为您一直在覆盖buffer的数据。 If, for example, you send 200 chunks, it means that you send 200 copies of the last content of buffer . 例如,如果您发送200个块,则意味着您发送了200个buffer的最后内容的副本。

The fix is relatively simple - make copies of the buffer before adding to bufferList : 修复相对简单-在添加到bufferList之前先制作缓冲区的副本:

bufferList.Add(buffer.Take(sum).ToArray());

This would fix the size problem, too, because the last chunk would have a smaller size, as represented by sum from the last call. 这也将解决大小问题,因为最后一个块的大小较小,如上次调用的sum所表示。 Most importantly, though, bufferList would contain copies of the buffer, rather than the references to the buffer itself. 不过,最重要的是, bufferList将包含缓冲区的副本,而不是对缓冲区本身的引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM