简体   繁体   English

如何从 C++ 中的 8 位迭代器获取 16 位或 32 位迭代器?

[英]How to get 16 or 32 bit iterator from 8 bit iterator in C++?

I wrote code, that uses this library http://utfcpp.sourceforge.net , which converts utf16to8:我写了代码,使用这个库http://utfcpp.sourceforge.net ,它转换 utf16to8:

    ifstream sourceFile("/home/myuser/utf16.txt", std::ifstream::binary);
    vector<unsigned char> res;
    std::vector<uint16_t> my_buffer;

    my_buffer.resize(fileSize/2);
    sourceFile.read((char*) my_buffer.data(), fileSize);

    utf8::utf16to8(my_buffer.begin(),
                   my_buffer.end(),
                   back_inserter(res));
    outFile.write((char*)&res[0], res.size());

My problem: if I want to convert utf32to8 I have to write all this code, but with different types in vector:我的问题:如果我想转换 utf32to8,我必须编写所有这些代码,但向量中的类型不同:

    vector<unsigned char> res;
    std::vector<uint32_t> my_buffer;

    my_buffer.resize(fileSize/4);
    sourceFile.read((char*) my_buffer.data(), fileSize);

    utf8::utf32to8(my_buffer.begin(),
                   my_buffer.end(),
                   back_inserter(res));
    outFile.write((char*)&res[0], res.size());

I am using std::vector and std::vector because lib uf8-cpp requires 32 and 16 bit iterators.我使用 std::vector 和 std::vector 因为 lib uf8-cpp 需要 32 位和 16 位迭代器。 Is there any way to get this iterators from std::vector, for example:有没有办法从 std::vector 获取这个迭代器,例如:

std::vector<char> myvector;
std::vector<uint16_t>::iterator u16bit_iterator = myvector.begin(); //this doesn't work now

Let's clarify the problem at hand.让我们澄清手头的问题。

You have:你有:

std::vector<uint32_t> in;
std::vector<uint8_t>  out;

You want in (data in UTF-32) to be transformed into out (data in UTF-8);您希望将in (UTF-32 格式的数据)转换为out (UTF-8 格式的数据); in is populated by 32-bit integers as this is required by utf8::utf32to8 (plus it makes sense) and out is populated by 8-bit integers (ie bytes) which also makes sense. in由 32 位整数填充,因为这是utf8::utf32to8所要求的(加上它有意义),而out由 8 位整数(即字节)填充,这也有意义。

Now:现在:

I want to refactore my code and use only one vector instead of two vectors with different types.我想重构我的代码并只使用一个向量而不是两个不同类型的向量。

Putting aside that this is a weak requirement (what's wrong with the types as they are now?) this is possible by switching from the default iterators, to pointers.撇开这是一个弱要求(现在的类型有什么问题?)这可以通过从默认迭代器切换到指针来实现。 The iterator std::vector<T>::iterator is for iterating over a std::vector<T> , period.迭代器std::vector<T>::iterator用于迭代std::vector<T>周期。 It doesn't matter that your different choices for T are all integers;你对T的不同选择都是整数并不重要; they are different types.它们是不同的类型。 But pointers can function as iterators (particularly when you are using a contiguous block of data like what a vector contains), and it is legal to re-interpret arbitrary memory as bytes (only).但是指针可以用作迭代器(特别是当您使用像向量包含的连续数据块时),并且将任意内存重新解释为字节(仅限)是合法的。

As a bonus, reading from the file will be easier as you are back to just reading bytes (which does make more sense in a way).作为奖励,从文件中读取会更容易,因为您返回到只读取字节(这在某种程度上更有意义)。

std::vector<uint8_t> in;
std::vector<uint8_t> out;

in.resize(fileSize);
sourceFile.read((char*)&in[0], fileSize);

// Make sure you have a whole number of 32-bit
// blocks, before we interpret them as bytes
assert((in.size() % 4) == 0);
utf8::utf32to8(
   (uint32_t*)&in.front(),
   (uint32_t*)&in.back(),
   std::back_inserter(out)
);

outFile.write((char*)&out.front(), out.size());

I hope I've interpreted your requirement properly.我希望我已经正确解释了您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM