简体   繁体   English

C ++编写大型二进制文件的任何更快的方法?

[英]C++ Any faster method to write a large binary file?

Goal 目标

My goal is to quickly create a file from a large binary string (a string that contains only 1 and 0). 我的目标是从一个大的二进制字符串(一个只包含1和0的字符串) 快速创建一个文件。

Straight to the point 开门见山

I need a function that can achieve my goal. 我需要一个可以实现我的目标的功能。 If I am not clear enough, please read on. 如果我不够清楚,请继续阅读。

Example

Test.exe is running...
.
Inputted binary string:
        1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
        Done!
File(Test.txt) In Byte(s):
        0xFF, 0xAA
.
Test.exe executed successfully!

Explanation 说明

  • First, Test.exe requested the user to input a binary string. 首先,Test.exe请求用户输入二进制字符串。
  • Then, it converted the inputted binary string to hexadecimal. 然后,它将输入的二进制字符串转换为十六进制。
  • Finally, it wrote the converted value to a file called Test.txt. 最后,它将转换后的值写入名为Test.txt的文件中。

I've tried 我试过了

As an fail attempt to achieve my goal, I've created this simple (and possibly horrible) function (hey, at least I tried): 作为实现目标的失败尝试,我创造了这个简单(可能是可怕的)功能(嘿,至少我试过):

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr )
{
    std::ofstream OutputFile( Destination, std::ofstream::binary );

    for( ::UINT Index1 = 0, Dec = 0;
         // 8-Bit binary.
         Index1 != BinaryStr.length( )/8;

         // Get the next set of binary value.
         // Write the decimal value as unsigned char to file.
         // Reset decimal value to 0.
         ++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
    {
        // Convert the 8-bit binary to hexadecimal using the
        // positional notation method - this is how its done:
        // http://www.wikihow.com/Convert-from-Binary-to-Decimal
        for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
            if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
    }
    OutputFile.close( );
};

Example of usage 用法示例

#include "Global.h"

void BinaryStrToFile( __in const char* Destination,
                      __in std::string &BinaryStr );

int main( void )
{
    std::string Bin = "";

    // Create a binary string that is a size of 9.53674 mb
    // Note: The creation of this string will take awhile.
    // However, I only start to calculate the speed of writing
    // and converting after it is done generating the string.
    // This string is just created for an example.
    std::cout << "Generating...\n";
    while( Bin.length( ) != 80000000 )
        Bin += "10101010";

    std::cout << "Writing...\n";
    BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );

    std::cout << "Done!\n";
#ifdef IS_DEBUGGING
    std::cout << "Paused...\n";
    ::getchar( );
#endif

    return( 0 );
};

Problem 问题

Again, that was my fail attempt to achieve my goal. 再一次,这是我未能实现目标的尝试。 The problem is the speed. 问题是速度。 It is too slow. 太慢了。 It took more than 7 minutes. 花了7分多钟。 Are there any method to quickly create a file from a large binary string? 有没有任何方法可以从大型二进制字符串快速创建文件?

Thanks in advance, 提前致谢,

CLearner 推行清洁

I'd suggest removing the substr call in the inner loop. 我建议删除内循环中的substr调用。 You are allocating a new string and then destroying it for each character that you process. 您正在分配一个新字符串,然后为您处理的每个字符销毁它。 Replace this code: 替换此代码:

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
        Dec += Inc;

by something like: 通过类似的东西:

for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
    if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
        Dec += Inc;

The majority of your time is spent here: 你的大部分时间都花在这里:

   for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
        if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;

When I comment that out the file is written in seconds. 当我评论说文件是在几秒钟内写的。 I think you need to finetune your conversion. 我想你需要微调你的转换。

I think I'd consider something like this as a starting point: 我想我会认为这是一个起点:

#include <bitset>
#include <fstream>
#include <algorithm>

int main() { 
    std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
    std::ofstream out("junk.bin", std::ios::binary | std::ios::out);

    std::transform(std::istream_iterator<std::bitset<8> >(in),
                   std::istream_iterator<std::bitset<8> >(),
                   std::ostream_iterator<unsigned char>(out),
                   [](std::bitset<8> const &b) { return b.to_ulong();});
    return 0;
}

Doing a quick test, this processes an input file of 80 million bytes in about 6 seconds on my machine. 进行快速测试,在我的机器上处理大约6秒内的8000万字节的输入文件。 Unless your files are much larger than what you've mentioned in your question, my guess is this is adequate speed, and the simplicity is going to be hard to beat. 除非你的文件比你在问题中提到的文件大得多,否则我的猜测是这是足够的速度,而且简单性很难被击败。

So instead of converting back and forth between std::string s, why not use a bunch of machine word-sized integers for fast access? 因此,不是在std::string s之间来回转换,为什么不使用一堆机器字大小的整数来快速访问?

const size_t bufsz = 1000000;

uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);

int i;
for (i = 0; i < bufsz; i++) {
    ofile << hex << setw(8) << setfill('0') << buf[i];
    // or if you want raw binary data instead of formatted hex:
    ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}

delete[] buf;

For me, this runs in a fraction of a second. 对我来说,这只需要几分之一秒。

Something not entirely unlike this should be significantly faster: 与此完全不同的东西应该明显更快:

void
text_to_binary_file(const std::string& text, const char *fname)
{
    unsigned char wbuf[4096];  // 4k is a good size of "chunk to write to file"
    unsigned int i = 0, j = 0;
    std::filebuf fp;           // dropping down to filebufs may well be faster
                               // for this problem
    fp.open(fname, std::ios::out|std::ios::trunc);
    memset(wbuf, 0, 4096);

    for (std::string::iterator p = text.begin(); p != text.end(); p++) {
        wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
        j++;
        if (j == CHAR_BIT) {
            j = 0;
            i++;
        }
        if (i == 4096) {
            if (fp.sputn(wbuf, 4096) != 4096)
                abort();
            memset(wbuf, 0, 4096);
            i = 0;
            j = 0;
        }
    }
    if (fp.sputn(wbuf, i+1) != i+1)
        abort();
    fp.close();
}

Proper error handling left as an exercise. 正确的错误处理留作练习。

Even though late, I want to place my example for handling such strings. 尽管很晚,我想把我的例子放在处理这样的字符串上。 Architecture specific optimizations may use unaligned loads of chars into multiple registers for 'squeezing' out the bits in parallel. 体系结构特定的优化可以使用未对齐的字符加载到多个寄存器中以并行地“压缩”这些位。 This untested example code does not check the chars and avoids alignment and endianness requirements. 这个未经测试的示例代码不会检查字符并避免对齐和字节序要求。 It assumes the characters of that binary string to represent contiguous octets (bytes) with the most significant bit first, not words and double words, etc., where their specific representation in memory (and in that string) would require special treatment for portability. 它假定该二进制字符串的字符首先表示具有最高有效位的连续八位字节(字节), 而不是字和双字等,其中它们在存储器(以及该字符串)中的特定表示将需要特殊处理以便于移植。

//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.

//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());

std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();

//You may treat cases, where (bits % 8 != 0) as error

//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
    if(i & 0x7) //bit 7 ... 1
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
    else //bit 0: write and advance to the the next most significant bit of an octet
    {
        byte |= uint8_t(*cIt) - uint8_t('0');
        ofs.put(byte);

        //advance
        ++i;
        ++cIt;
        byte = uint8_t(*cIt) - uint8_t('0');
        byte <<= 1;
    }
}

ofs.flush();

This make a 76.2 MB (80,000,000 bytes) file of 1010101010101...... 这使得一个76.2 MB(80,000,000字节)的文件1010101010101 ......

#include <stdio.h>
#include <iostream>
#include <fstream>

using namespace std;

int main( void )
{
    char Bin=0;
    ofstream myfile;
    myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
    int c=0;
    Bin = 0xAA;
    while( c!= 80000000 ){
        myfile.write(&Bin,1);
        c++;
    }
    myfile.close();
    cout << "Done!\n";
    return( 0 );
};

这是文件的第一个字节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM