简体   繁体   中英

How to use std::string to store bytes (unsigned chars) in a right way?

I'm coding LZ77 compression algorithm, and I have trouble storing unsigned chars in a string. To compress any file, I use its binary representation and then read it as chars (because 1 char is equal to 1 byte, afaik) to a std::string . Everything works perfectly fine with chars . But after some time googling I learned that char is not always 1 byte, so I decided to swap it for unsigned char . And here things start to get tricky:

  • When compressing plain .txt, everything works as expected, I get equal files before and after decompression (I assume it should, because we basically work with text before and after byte conversion)
  • However, when trying to compress .bmp, decompressed file loses 3 bytes compared to input file (I lose these 3 bytes when trying to save unsigned chars to a std::string)

So, my question is – is there a way to properly save unsigned chars to a string?

I tried to use typedef basic_string<unsigned char> ustring and swap all related functions for their basic alternatives to use with unsigned char , but I still lose 3 bytes.

UPDATE: I found out that 3 bytes (symbols) are lost not because of std::string, but because of std::istream_iterator (that I use instead of std::istreambuf_iterator ) to create string of unsigned chars (because std::istreambuf_iterator 's argument is char, not unsigned char)

So, are there any solutions to this particular problem?

Example:

std::vector<char> tempbuf(std::istreambuf_iterator<char>(file), {}); // reads 112782 symbols

std::vector<char> tempbuf(std::istream_iterator<char>(file), {}); // reads 112779 symbols

Sample code:

void LZ77::readFileUnpacked(std::string& path)

{


std::ifstream file(path, std::ios::in | std::ios::binary);

if (file.is_open())
{
    // Works just fine with char, but loses 3 bytes with unsigned
    std::string tempstring = std::string(std::istreambuf_iterator<char>(file), {});
    file.close();
}
else
    throw std::ios_base::failure("Failed to open the file");
}

char in all of its forms (and std::byte , which is isomorphic with unsigned char ) is always the smallest possible type that a system supports. The C++ standard defines that sizeof(char) and its variations shall always be exactly 1.

"One" what? That's implementation-defined. But every type in the system will be some multiple of sizeof(char) in size.

So you shouldn't be too concerned over systems where char is not one byte. If you're working under a system where CHAR_BITS isn't 8, then that system can't handle 8-bit bytes directly at all. So unsigned char won't be any different/better for this purpose.


As to the particulars of your problem, istream_iterator is fundamentally different from istreambuf_iterator iterator. The purpose of the latter is to allow iterator access to the actual stream as a sequence of values. The purpose of istream_iterator<T> is to allow access to a stream as if by performing a repeated sequence of operator >> calls with a T value.

So if you're doing istream_iterator<char> , then you're saying that you want to read the stream as if you did stream >> some_char; variable for each iterator access. That isn't actually isomorphic with accessing the stream's characters directly. Specifically, FormattedInputFunctions like operator>> can do things like skip whitespace, depending on how you set up your stream.

istream_iterator is reading using operator>> which usually skip white spaces as part of its function. If you want to disable that behavior, you'll have to do

#include <ios>

file >> std::noskipws;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM