I wrote code, that uses this library http://utfcpp.sourceforge.net , which converts utf16to8:
ifstream sourceFile("/home/myuser/utf16.txt", std::ifstream::binary);
vector<unsigned char> res;
std::vector<uint16_t> my_buffer;
my_buffer.resize(fileSize/2);
sourceFile.read((char*) my_buffer.data(), fileSize);
utf8::utf16to8(my_buffer.begin(),
my_buffer.end(),
back_inserter(res));
outFile.write((char*)&res[0], res.size());
My problem: if I want to convert utf32to8 I have to write all this code, but with different types in vector:
vector<unsigned char> res;
std::vector<uint32_t> my_buffer;
my_buffer.resize(fileSize/4);
sourceFile.read((char*) my_buffer.data(), fileSize);
utf8::utf32to8(my_buffer.begin(),
my_buffer.end(),
back_inserter(res));
outFile.write((char*)&res[0], res.size());
I am using std::vector and std::vector because lib uf8-cpp requires 32 and 16 bit iterators. Is there any way to get this iterators from std::vector, for example:
std::vector<char> myvector;
std::vector<uint16_t>::iterator u16bit_iterator = myvector.begin(); //this doesn't work now
Let's clarify the problem at hand.
You have:
std::vector<uint32_t> in;
std::vector<uint8_t> out;
You want in
(data in UTF-32) to be transformed into out
(data in UTF-8); in
is populated by 32-bit integers as this is required by utf8::utf32to8
(plus it makes sense) and out
is populated by 8-bit integers (ie bytes) which also makes sense.
Now:
I want to refactore my code and use only one vector instead of two vectors with different types.
Putting aside that this is a weak requirement (what's wrong with the types as they are now?) this is possible by switching from the default iterators, to pointers. The iterator std::vector<T>::iterator
is for iterating over a std::vector<T>
, period. It doesn't matter that your different choices for T
are all integers; they are different types. But pointers can function as iterators (particularly when you are using a contiguous block of data like what a vector contains), and it is legal to re-interpret arbitrary memory as bytes (only).
As a bonus, reading from the file will be easier as you are back to just reading bytes (which does make more sense in a way).
std::vector<uint8_t> in;
std::vector<uint8_t> out;
in.resize(fileSize);
sourceFile.read((char*)&in[0], fileSize);
// Make sure you have a whole number of 32-bit
// blocks, before we interpret them as bytes
assert((in.size() % 4) == 0);
utf8::utf32to8(
(uint32_t*)&in.front(),
(uint32_t*)&in.back(),
std::back_inserter(out)
);
outFile.write((char*)&out.front(), out.size());
I hope I've interpreted your requirement properly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.