简体   繁体   中英

Reading and write strings in binary files c++

I'm trying to develop a small Windows application to improve my C++ skill outside MFC framework and to help my studies about foreign languages.
I would like to make a small, personal and easy-to-port_and_use dictionary and, while I have no problems in developing the GUI, I'm having real pain in saving and restoring data.

My idea is to write down a binary files structured as follow:

int (representing the number of words)
int (representing the string length + \0)
sequence of characters zero-terminated.
Now, I'm learning russian and my primary language is italian, so I can't use plain old std::string to write down words, moreover, thank you Microsoft, I'm using VS2010 with all the goods and bads that come with it. I'm showing you my routines to write down int and wstring:
01 00 ee bc 90 22 05 00 ee bc 90 22 63 69 61 6f
00 ec b3 8c 07 00 ee bc 90 22 d0 bf d1 80 d0 b8
d0 b2 d0 b5 d1 82 00 ec b3 8c

Well, my first test has been: ciao - привет , result:

 01 00 ee bc 90 22 05 00 ee bc 90 22 63 69 61 6f 00 ec b3 8c 07 00 ee bc 90 22 d0 bf d1 80 d0 b8 d0 b2 d0 b5 d1 82 00 ec b3 8c 
Numbers are read correctly, the problem comes when I write down strings: I'd expect that ciao (63 69 61 6f 00 ec b3 8c) was written in 10 bytes (wchar_t size) and not in 5, as happens for russian translation ( d0 bf d1 80 d0 b8 d0 b2 d0 b5 d1 82 00 ec b3 8c).
Obviously I'm missing something, but I can't figure what it is. Can you guys help me out? Also, if you know a better approach to solve the problem, I'm open minded.



Following the first of the two method presented by @JamesKanze, I've decided to sacrify some portability and let the system do my homework:

 void CDizionario::LeggiInt( int *pInt, ifstream& file ) { file.read( reinterpret_cast( pInt ), sizeof( int ) ); } 

\n\n

void CDizionario::LeggiWString( int nLStringa, wstring& strStringa, ifstream& file ) { char *pBuf; streamsize byteDaLeggere; wstring_convert> converter; byteDaLeggere = nLStringa; pBuf = new char[byteDaLeggere]; file.read( pBuf, byteDaLeggere ); strStringa = converter.from_bytes( pBuf ); delete [] pBuf; }

\n\n

void CDizionario::ScriviInt( int nInt, ofstream& file ) const { file.write( reinterpret_cast( &nInt ), sizeof( nInt ) ); file.flush(); } void CDizionario::ScriviWString( const wstring* pStrStringa, ofstream& file ) const { char cTerminatore; string strStringa; wstring_convert> converter; strStringa = converter.to_bytes( pStrStringa->c_str() ); ScriviInt( strStringa.length() + 1, file ); file.write( strStringa.c_str(), strStringa.length() ); file.flush(); cTerminatore = '\\0'; file.write( &cTerminatore, sizeof( char ) ); file.flush(); }

You've not sufficiently specified the format of the binary file. How do you represent an int (how many bytes, big-endian or little-endian), nor the encoding and the format of the characters. The classical network representation would be a big-endian four byte (unsigned) integer, and UTF-8. Since this is something you're doing for your self, you can (and probably should) simplify, using little-endian for integer, and UTF-16LE; these formats correspond to the internal format under Windows . (Note that such code will not be portable, not even to Apple or Linux on the same architecture, and the there is a small chance that the data become unreadable on a new system.) This is basically what you seem to be attempting, but...

You're trying to write raw binary. The only standard way to do this would be to use std::ofstream (and std::ifstream to read), with the file opened in binary mode and imbued with the "C" locale. For anything else, there will (or may) be some sort of code translation and mapping in the std::filebuf . Given this (and the fact that this way of writing data is not portable to any other system), you may want to just use the system level functions: CreateFile to open, WriteFile and ReadFile to write and read, and CloseHandle to close. (See http://msdn.microsoft.com/en-us/library/windows/desktop/aa364232%28v=vs.85%29.aspx ).

If you want to be portable, on the other hand, I would recommend using the standard network format for the data. Format it into a buffer ( std::vector<char> ), and write that; at the other end, read into a buffer, and parse that. The read and write routines for an integer (actually an unsigned integer) might be something like:

void
writeUnsignedInt( std::vector<char>& buffer, unsigned int i )
{
    buffer.push_back( (i >> 24) & oxFF );
    buffer.push_back( (i >> 16) & oxFF );
    buffer.push_back( (i >>  8) & oxFF );
    buffer.push_back( (i      ) & oxFF );
}

unsigned int
readUnsignedInt( 
    std::vector<char>::const_iterator& current,
    std::vector<char>::const_iterator end )
{
    unsigned int retval = 0;
    int shift = 32;
    while ( shift != 0 && current != end ) {
        shift -= 8;
        retval |= static_cast<unsigned char>( *current ) << shift;
        ++ current;
    }
    if ( shift != 0 ) {
        throw std::runtime_error( "Unexpected end of file" );
    }
    return retval;
}

For the characters, you'll have to convert your std::wstring to std::string in UTF-8, using one of the many conversion routines available on the network. (The problem is that the encoding of std::wstring , nor even the size of a wchar_t , is not standardized. Of the systems I'm familiarized, Windows and AIX use UTF-16, most others UTF-32; in both cases with the byte order dependent on the platform. This makes portable code a bit more difficult.)

Globally, I find it easier to just do everything directly in UTF-8, using char . This won't work with the Windows interface, however.

And finally, you don't need the trailing '\\0' if you output the length.

@IssamTP, привет

As mentioned by @James Kanze, working with foreign non-latin languages inevitably pushes you to per-byte format conventions and locales. So it may be worth to not re-invent the wheel and use existing technologies like XML (so the technology will serve the nuances and encode/decode non-latin chars properly).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM