简体   繁体   中英

How to get 16 or 32 bit iterator from 8 bit iterator in C++?

I wrote code, that uses this library http://utfcpp.sourceforge.net , which converts utf16to8:

    ifstream sourceFile("/home/myuser/utf16.txt", std::ifstream::binary);
    vector<unsigned char> res;
    std::vector<uint16_t> my_buffer;

    my_buffer.resize(fileSize/2);
    sourceFile.read((char*) my_buffer.data(), fileSize);

    utf8::utf16to8(my_buffer.begin(),
                   my_buffer.end(),
                   back_inserter(res));
    outFile.write((char*)&res[0], res.size());

My problem: if I want to convert utf32to8 I have to write all this code, but with different types in vector:

    vector<unsigned char> res;
    std::vector<uint32_t> my_buffer;

    my_buffer.resize(fileSize/4);
    sourceFile.read((char*) my_buffer.data(), fileSize);

    utf8::utf32to8(my_buffer.begin(),
                   my_buffer.end(),
                   back_inserter(res));
    outFile.write((char*)&res[0], res.size());

I am using std::vector and std::vector because lib uf8-cpp requires 32 and 16 bit iterators. Is there any way to get this iterators from std::vector, for example:

std::vector<char> myvector;
std::vector<uint16_t>::iterator u16bit_iterator = myvector.begin(); //this doesn't work now

Let's clarify the problem at hand.

You have:

std::vector<uint32_t> in;
std::vector<uint8_t>  out;

You want in (data in UTF-32) to be transformed into out (data in UTF-8); in is populated by 32-bit integers as this is required by utf8::utf32to8 (plus it makes sense) and out is populated by 8-bit integers (ie bytes) which also makes sense.

Now:

I want to refactore my code and use only one vector instead of two vectors with different types.

Putting aside that this is a weak requirement (what's wrong with the types as they are now?) this is possible by switching from the default iterators, to pointers. The iterator std::vector<T>::iterator is for iterating over a std::vector<T> , period. It doesn't matter that your different choices for T are all integers; they are different types. But pointers can function as iterators (particularly when you are using a contiguous block of data like what a vector contains), and it is legal to re-interpret arbitrary memory as bytes (only).

As a bonus, reading from the file will be easier as you are back to just reading bytes (which does make more sense in a way).

std::vector<uint8_t> in;
std::vector<uint8_t> out;

in.resize(fileSize);
sourceFile.read((char*)&in[0], fileSize);

// Make sure you have a whole number of 32-bit
// blocks, before we interpret them as bytes
assert((in.size() % 4) == 0);
utf8::utf32to8(
   (uint32_t*)&in.front(),
   (uint32_t*)&in.back(),
   std::back_inserter(out)
);

outFile.write((char*)&out.front(), out.size());

I hope I've interpreted your requirement properly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM