简体   繁体   English

将文件读入4个字节的ByteArrays

[英]Read file into ByteArrays of 4 bytes

I would like to know how I could read a file into ByteArrays that are 4 bytes long. 我想知道如何将文件读入4字节长的ByteArrays中。 These arrays will be manipulated and then have to be converted back to a single array ready to be written to a file. 这些数组将被处理,然后必须转换回单个数组以准备写入文件。

EDIT: Code snippet. 编辑:代码段。

    var arrays = new List<byte[]>();
    using (var f = new FileStream("file.cfg.dec", FileMode.Open))
    {
        for (int i = 0; i < f.Length; i += 4)
        {
            var b = new byte[4];
            var bytesRead = f.Read(b, i, 4);
            if (bytesRead < 4)
            {
                var b2 = new byte[bytesRead];
                Array.Copy(b, b2, bytesRead);
                arrays.Add(b2);
            }
            else if (bytesRead > 0)
                arrays.Add(b);
        }
    }

    foreach (var b in arrays)
    {
        BitArray source = new BitArray(b);
        BitArray target = new BitArray(source.Length);

        target[26] = source[0];
        target[31] = source[1];
        target[17] = source[2];
        target[10] = source[3];
        target[30] = source[4];
        target[16] = source[5];
        target[24] = source[6];
        target[2] = source[7];
        target[29] = source[8];
        target[8] = source[9];
        target[20] = source[10];
        target[15] = source[11];
        target[28] = source[12];
        target[11] = source[13];
        target[13] = source[14];
        target[4] = source[15];
        target[19] = source[16];
        target[23] = source[17];
        target[0] = source[18];
        target[12] = source[19];
        target[14] = source[20];
        target[27] = source[21];
        target[6] = source[22];
        target[18] = source[23];
        target[21] = source[24];
        target[3] = source[25];
        target[9] = source[26];
        target[7] = source[27];
        target[22] = source[28];
        target[1] = source[29];
        target[25] = source[30];
        target[5] = source[31];

        var back2byte = BitArrayToByteArray(target);

        arrays.Clear();
        arrays.Add(back2byte);
    }

    using (var f = new FileStream("file.cfg.enc", FileMode.Open))
    {
        foreach (var b in arrays)
            f.Write(b, 0, b.Length);
    }

EDIT 2: Here is the Ugly Betty-looking code that accomplishes what I wanted. 编辑2:这是看起来像丑陋的贝蒂的代码,完成了我想要的。 Now I must refine it for performance... 现在我必须对其进行优化以提高性能...

var arrays_ = new List<byte[]>();
var arrays_save = new List<byte[]>();
var arrays = new List<byte[]>();
using (var f = new FileStream("file.cfg.dec", FileMode.Open))
{
    for (int i = 0; i < f.Length; i += 4)
    {
        var b = new byte[4];
        var bytesRead = f.Read(b, 0, b.Length);
        if (bytesRead < 4)
        {
            var b2 = new byte[bytesRead];
            Array.Copy(b, b2, bytesRead);
            arrays.Add(b2);
        }
        else if (bytesRead > 0)
            arrays.Add(b);
    }
}

foreach (var b in arrays)
{
    arrays_.Add(b);
}
foreach (var b in arrays_)
{
    BitArray source = new BitArray(b);
    BitArray target = new BitArray(source.Length);

    target[26] = source[0];
    target[31] = source[1];
    target[17] = source[2];
    target[10] = source[3];
    target[30] = source[4];
    target[16] = source[5];
    target[24] = source[6];
    target[2] = source[7];
    target[29] = source[8];
    target[8] = source[9];
    target[20] = source[10];
    target[15] = source[11];
    target[28] = source[12];
    target[11] = source[13];
    target[13] = source[14];
    target[4] = source[15];
    target[19] = source[16];
    target[23] = source[17];
    target[0] = source[18];
    target[12] = source[19];
    target[14] = source[20];
    target[27] = source[21];
    target[6] = source[22];
    target[18] = source[23];
    target[21] = source[24];
    target[3] = source[25];
    target[9] = source[26];
    target[7] = source[27];
    target[22] = source[28];
    target[1] = source[29];
    target[25] = source[30];
    target[5] = source[31];

    var back2byte = BitArrayToByteArray(target);

    arrays_save.Add(back2byte);
}

using (var f = new FileStream("file.cfg.enc", FileMode.Open))
{
    foreach (var b in arrays_save)
        f.Write(b, 0, b.Length);
}

EDIT 3: Loading a big file into byte arrays of 4 bytes wasn't the smartest idea... I have over 68 million arrays being processed and manipulated. 编辑3:将大文件加载到4个字节的字节数组中并不是最聪明的主意……我有超过6800万个数组正在处理和操纵。 I really wonder if its possible to load it into a single array and still have the bit manipulation work. 我真的很想知道是否有可能将其加载到单个数组中,并且仍然可以进行位操作。 :/ :/

Here's another way, similar to @igofed's solution: 这是另一种方式,类似于@igofed的解决方案:

var arrays = new List<byte[]>();
using (var f = new FileStream("test.txt", FileMode.Open))
{
    for (int i = 0; i < f.Length; i += 4)
    {
        var b = new byte[4];
        var bytesRead = f.Read(b, i, 4);
        if (bytesRead < 4)
        {
            var b2 = new byte[bytesRead];
            Array.Copy(b, b2, bytesRead);
            arrays.Add(b2);
        }
        else if (bytesRead > 0)
            arrays.Add(b);
    }
}
//make changes to arrays
using (var f = new FileStream("test-out.txt", FileMode.Create))
{
    foreach (var b in arrays)
        f.Write(b, 0, b.Length);
}

Here is what you want: 这是您想要的:

using (var reader = new StreamReader("inputFileName"))
{
    using (var writer = new StreamWriter("outputFileName"))
    {
        char[] buff = new char[4];
        int readCount = 0;
        while((readCount = reader.Read(buff, 0, 4)) > 0)
        {
            //manipulations with buff

            writer.Write(buff);
        }
    }
}
IEnumerable<byte[]> arraysOf4Bytes = File
    .ReadAllBytes(path)
    .Select((b,i) => new{b, i})
    .GroupBy(x => x.i / 4)
    .Select(g => g.Select(x => x.b).ToArray())

Regarding your "Edit 3" ... I'll bite, although it's really a diversion from the original question. 关于您的“ Edit 3” ...我会咬,尽管这确实是对原始问题的转移。

There's no reason you need Lists of arrays, since you're just breaking up the file into a continuous list of 4-byte sequences, looping through and processing each sequence, and then looping through and writing each sequence. 没有理由需要数组列表,因为您只是将文件分解为一个连续的4字节序列列表,依次遍历和处理每个序列,然后遍历并写入每个序列。 You can do much better. 您可以做得更好。 NOTE: The implementation below does not check for or handle input files whose lengths are not exactly multiples of 4. I leave that as an exercise to you, if it is important. 注意:下面的实现不检查或处理长度不完全是4的倍数的输入文件。如果重要,我将其留给您练习。

To directly address your comment, here is a single-array solution. 为了直接发表您的评论,这是一个单阵列解决方案。 We'll ditch the List objects, read the whole file into a single byte[] array, and then copy out 4-byte sections of that array to do your bit transforms, then put the result back. 我们将放弃List对象,将整个文件读取到一个byte []数组中,然后复制该数组的4字节部分进行位转换,然后将结果放回去。 At the end we'll just slam the whole thing into the output file. 最后,我们将整个过程放入输出文件中。

byte[] data;
using (Stream fs = File.OpenRead("E:\\temp\\test.bmp")) {
    data = new byte[fs.Length];
    fs.Read(data, 0, data.Length);
}

byte[] element = new byte[4];
for (int i = 0; i < data.Length; i += 4) {
    Array.Copy(data, i, element, 0, element.Length);

    BitArray source = new BitArray(element);
    BitArray target = new BitArray(source.Length);

    target[26] = source[0];
    target[31] = source[1];
    // ...
    target[5] = source[31];

    target.CopyTo(data, i);
}

using (Stream fs = File.OpenWrite("E:\\temp\\test_out.bmp")) {
    fs.Write(data, 0, data.Length);
}

All of the ugly initial read code is gone since we're just using a single byte array. 由于我们仅使用一个单字节数组,所有丑陋的初始读取代码都消失了。 Notice I reserved a single 4-byte array before the processing loop to re-use, so we can save the garbage collector some work. 注意,我在处理循环之前保留了一个4字节的数组以供重用,因此我们可以为垃圾收集器节省一些工作。 Then we loop through the giant data array 4 bytes at a time and copy them into our working array, use that to initialize the BitArrays for your transforms, and then the last statement in the block converts the BitArray back into a byte array, and copies it directly back to its original location within the giant data array. 然后,我们一次循环遍历4个字节的巨型数据数组,并将其复制到我们的工作数组中,使用该数组初始化您的转换的BitArrays,然后该块中的最后一条语句将BitArray转换回字节数组,然后进行复制它直接返回到其在巨型数据阵列中的原始位置。 This replaces BitArrayToByteArray method, since you did not provide it. 由于您未提供该方法,因此它将替换BitArrayToByteArray方法。 At the end, writing is also easy since it's just slamming out the now-transformed giant data array. 最后,写操作也很容易,因为它只是将现在已转换的巨型数据阵列都扔掉了。

When I ran your original solution I got an OutOfMemory exception on my original test file of 100MB, so I used a 44MB file. 运行原始解决方案时,我的100MB原始测试文件出现了OutOfMemory异常,因此我使用了44MB的文件。 It consumed 650MB in memory and ran in 30 seconds. 它消耗了650MB的内存,并在30秒内运行。 The single-array solution used 54MB of memory and ran in 10 seconds. 单阵列解决方案使用了54MB的内存,并在10秒内运行。 Not a bad improvement, and it demonstrates how bad holding onto millions of small array objects is. 这不是一个不好的改进,它表明了保持数百万个小数组对象的糟糕程度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM