简体   繁体   English

在二进制文件中读取和写入字符串C ++

[英]Reading and write strings in binary files c++

I'm trying to develop a small Windows application to improve my C++ skill outside MFC framework and to help my studies about foreign languages. 我正在尝试开发一个小型Windows应用程序,以提高MFC框架之外的C ++技能并帮助我进行有关外语的研究。
I would like to make a small, personal and easy-to-port_and_use dictionary and, while I have no problems in developing the GUI, I'm having real pain in saving and restoring data. 我想制作一个小型的,个人化的,易于移植和使用的字典,尽管在开发GUI时没有问题,但在保存和恢复数据方面确实遇到了麻烦。

My idea is to write down a binary files structured as follow: 我的想法是写下一个结构如下的二进制文件:

int (representing the number of words)
int (representing the string length + \0)
sequence of characters zero-terminated.
Now, I'm learning russian and my primary language is italian, so I can't use plain old std::string to write down words, moreover, thank you Microsoft, I'm using VS2010 with all the goods and bads that come with it. 现在,我正在学习俄语,并且我的主要语言是意大利语,所以我不能使用普通的旧std :: string来写下单词,而且,谢谢Microsoft,我正在使用VS2010以及所有附带的优点缺点用它。 I'm showing you my routines to write down int and wstring: 我向您展示了我编写int和wstring的例程:
01 00 ee bc 90 22 05 00 ee bc 90 22 63 69 61 6f
00 ec b3 8c 07 00 ee bc 90 22 d0 bf d1 80 d0 b8
d0 b2 d0 b5 d1 82 00 ec b3 8c

Well, my first test has been: ciao - привет , result: 好吧,我的第一个测试是: ciao-привет ,结果:

 01 00 ee bc 90 22 05 00 ee bc 90 22 63 69 61 6f 00 ec b3 8c 07 00 ee bc 90 22 d0 bf d1 80 d0 b8 d0 b2 d0 b5 d1 82 00 ec b3 8c 
Numbers are read correctly, the problem comes when I write down strings: I'd expect that ciao (63 69 61 6f 00 ec b3 8c) was written in 10 bytes (wchar_t size) and not in 5, as happens for russian translation ( d0 bf d1 80 d0 b8 d0 b2 d0 b5 d1 82 00 ec b3 8c). 正确读取数字,当我写下字符串时就会出现问题:我希望ciao (63 69 61 6f 00 ec b3 8c)是用10个字节(wchar_t大小)而不是5个字节写的,就像俄语翻译一样( d0 bf d1 80 d0 b8 d0 b2 d0 b5 d1 82 00 ec b3 8c)。
Obviously I'm missing something, but I can't figure what it is. 显然我缺少了一些东西,但是我无法弄清楚它是什么。 Can you guys help me out? 你们能帮我吗? Also, if you know a better approach to solve the problem, I'm open minded. 另外,如果您知道解决问题的更好方法,我也会开放的态度。

EDIT: SOLUTION 编辑:解决方案

Following the first of the two method presented by @JamesKanze, I've decided to sacrify some portability and let the system do my homework: 在@JamesKanze提出的两种方法中的第一种之后,我决定牺牲一些可移植性,并让系统完成我的作业:

 void CDizionario::LeggiInt( int *pInt, ifstream& file ) { file.read( reinterpret_cast( pInt ), sizeof( int ) ); } 

\n\n

void CDizionario::LeggiWString( int nLStringa, wstring& strStringa, ifstream& file ) { char *pBuf; void CDizionario :: LeggiWString(int nLStringa,wstring&strStringa,ifstream&file){char * pBuf; streamsize byteDaLeggere; streamsize byteDaLeggere; wstring_convert> converter; wstring_convert>转换器; byteDaLeggere = nLStringa; byteDaLeggere = nLStringa; pBuf = new char[byteDaLeggere]; pBuf =新的char [byteDaLeggere]; file.read( pBuf, byteDaLeggere ); file.read(pBuf,byteDaLeggere); strStringa = converter.from_bytes( pBuf ); strStringa = converter.from_bytes(pBuf); delete [] pBuf; 删除[] pBuf; } }

\n\n

void CDizionario::ScriviInt( int nInt, ofstream& file ) const { file.write( reinterpret_cast( &nInt ), sizeof( nInt ) ); void CDizionario :: ScriviInt(int nInt,ofstream&file)const {file.write(reinterpret_cast(&nInt),sizeof(nInt)); file.flush(); file.flush(); } void CDizionario::ScriviWString( const wstring* pStrStringa, ofstream& file ) const { char cTerminatore; } void CDizionario :: ScriviWString(const wstring * pStrStringa,ofstream&file)const {char cTerminatore; string strStringa; 字符串strStringa; wstring_convert> converter; wstring_convert>转换器; strStringa = converter.to_bytes( pStrStringa->c_str() ); strStringa = converter.to_bytes(pStrStringa-> c_str()); ScriviInt( strStringa.length() + 1, file ); ScriviInt(strStringa.length()+ 1,file); file.write( strStringa.c_str(), strStringa.length() ); file.write(strStringa.c_str(),strStringa.length()); file.flush(); file.flush(); cTerminatore = '\\0'; cTerminatore ='\\ 0'; file.write( &cTerminatore, sizeof( char ) ); file.write(&cTerminatore,sizeof(char)); file.flush(); file.flush(); } }

You've not sufficiently specified the format of the binary file. 您没有充分指定二进制文件的格式。 How do you represent an int (how many bytes, big-endian or little-endian), nor the encoding and the format of the characters. 您如何表示一个int (多少个字节,big-endian或little-endian),也不表示字符的编码和格式。 The classical network representation would be a big-endian four byte (unsigned) integer, and UTF-8. 经典的网络表示形式是一个大端四字节(无符号)整数和UTF-8。 Since this is something you're doing for your self, you can (and probably should) simplify, using little-endian for integer, and UTF-16LE; 由于这是您为自己所做的事情,因此可以(可能应该)简化操作,对整数使用little-endian,并使用UTF-16LE; these formats correspond to the internal format under Windows . 这些格式与Windows下的内部格式相对应。 (Note that such code will not be portable, not even to Apple or Linux on the same architecture, and the there is a small chance that the data become unreadable on a new system.) This is basically what you seem to be attempting, but... (请注意,这样的代码甚至在相同体系结构的Apple或Linux上也无法移植,并且数据在新系统上变得不可读的可能性很小。)这基本上是您正在尝试的操作,但是...

You're trying to write raw binary. 您正在尝试编写原始二进制文件。 The only standard way to do this would be to use std::ofstream (and std::ifstream to read), with the file opened in binary mode and imbued with the "C" locale. 唯一的标准方法是使用std::ofstream (和要读取的std::ifstream ),文件以二进制模式打开充满"C"语言环境。 For anything else, there will (or may) be some sort of code translation and mapping in the std::filebuf . 除此之外, std::filebuf中将(或可能)进行某种类型的代码转换和映射。 Given this (and the fact that this way of writing data is not portable to any other system), you may want to just use the system level functions: CreateFile to open, WriteFile and ReadFile to write and read, and CloseHandle to close. 鉴于此(以及这种写入数据的方式不能移植到任何其他系统的事实),您可能只想使用系统级功能: CreateFile打开, WriteFileReadFile写入和读取, CloseHandle关闭。 (See http://msdn.microsoft.com/en-us/library/windows/desktop/aa364232%28v=vs.85%29.aspx ). (请参阅http://msdn.microsoft.com/zh-cn/library/windows/desktop/aa364232%28v=vs.85%29.aspx )。

If you want to be portable, on the other hand, I would recommend using the standard network format for the data. 另一方面,如果您想携带便携式设备,我建议对数据使用标准网络格式。 Format it into a buffer ( std::vector<char> ), and write that; 将其格式化为缓冲区( std::vector<char> ),并将其写入; at the other end, read into a buffer, and parse that. 在另一端,读入缓冲区并解析。 The read and write routines for an integer (actually an unsigned integer) might be something like: 整数(实际上是无符号整数)的读取和写入例程可能类似于:

void
writeUnsignedInt( std::vector<char>& buffer, unsigned int i )
{
    buffer.push_back( (i >> 24) & oxFF );
    buffer.push_back( (i >> 16) & oxFF );
    buffer.push_back( (i >>  8) & oxFF );
    buffer.push_back( (i      ) & oxFF );
}

unsigned int
readUnsignedInt( 
    std::vector<char>::const_iterator& current,
    std::vector<char>::const_iterator end )
{
    unsigned int retval = 0;
    int shift = 32;
    while ( shift != 0 && current != end ) {
        shift -= 8;
        retval |= static_cast<unsigned char>( *current ) << shift;
        ++ current;
    }
    if ( shift != 0 ) {
        throw std::runtime_error( "Unexpected end of file" );
    }
    return retval;
}

For the characters, you'll have to convert your std::wstring to std::string in UTF-8, using one of the many conversion routines available on the network. 对于字符,您必须使用网络上可用的许多转换例程之一,将UTF-8中的std :: wstring转换为std :: string。 (The problem is that the encoding of std::wstring , nor even the size of a wchar_t , is not standardized. Of the systems I'm familiarized, Windows and AIX use UTF-16, most others UTF-32; in both cases with the byte order dependent on the platform. This makes portable code a bit more difficult.) (问题是std::wstring的编码甚至wchar_t的大小都没有标准化。在我熟悉的系统中,Windows和AIX使用UTF-16,大多数使用UTF-32;在两种情况下字节顺序取决于平台。这使可移植代码更加困难。)

Globally, I find it easier to just do everything directly in UTF-8, using char . 在全球范围内,我发现使用char直接在UTF-8中直接执行所有操作会更容易。 This won't work with the Windows interface, however. 但是,这不适用于Windows界面。

And finally, you don't need the trailing '\\0' if you output the length. 最后,如果输出长度,则不需要尾随'\\0'

@IssamTP, привет @IssamTP,привет

As mentioned by @James Kanze, working with foreign non-latin languages inevitably pushes you to per-byte format conventions and locales. 正如@James Kanze所提到的那样,使用外部非拉丁语言不可避免地将您推向按字节的格式约定和语言环境。 So it may be worth to not re-invent the wheel and use existing technologies like XML (so the technology will serve the nuances and encode/decode non-latin chars properly). 因此,可能值得不重新发明轮子并使用现有技术(例如XML)(这样的技术将为细微差别提供服务并正确地编码/解码非拉丁字符)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM