简体   繁体   English

从std :: vector转换 <char> 到wchar_t *

[英]conversion from std::vector<char> to wchar_t*

i'm trying to read ID3 frames and their values with TagLib (1) and index them with CLucene (2). 我正在尝试使用TagLib(1)读取ID3帧及其值,并使用CLucene(2)对其进行索引。 the former returns frame ID's as std::vector<char> (3) and the latter writes field names as tchar* [ wchar_t* in Linux] (4). 前者将帧ID返回为std::vector<char> (3),后者将字段名称写为tchar* [在Linux中为wchar_t* ](4)。 i need to make a link between the two. 我需要在两者之间建立联系。 how can i convert from std::vector<char> to wchar_t* by means of the STL? 如何通过STL将std::vector<char>转换为wchar_t* thank you 谢谢

(1) http://developer.kde.org/~wheeler/taglib.html (1) http://developer.kde.org/~wheeler/taglib.html
(2) http://clucene.sourceforge.net/ (2) http://clucene.sourceforge.net/
(3) http://developer.kde.org/~wheeler/taglib/api/classTagLib_1_1ID3v2_1_1Frame.html#6aac53ec5893fd15164cd22c6bdb5dfd (3) http://developer.kde.org/~wheeler/taglib/api/classTagLib_1_1ID3v2_1_1Frame.html#6aac53ec5893fd15164cd22c6bdb5dfd
(4) http://ohnopublishing.net/doc/clucene-0.9.21b/html/classlucene_1_1document_1_1Field.html#59b0082e2ade8c78a51a64fe99e684b2 (4) http://ohnopublishing.net/doc/clucene-0.9.21b/html/classlucene_1_1document_1_1Field.html#59b0082e2ade8c78a51a64fe99e684b2

In a simple case where your char s don't contain any accented characters or anything like that, you can just copy each one to the destination and use it: 在您的char不包含任何带重音符号或类似内容的简单情况下,您可以将每个字符复制到目的地并使用它:

std::vector<char> frameID;

std::vector<wchar_t> field_name;

std::copy(frameID.begin(), frameID.end(), std::back_inserter(field_name));

lucene_write_field(&field_name[0], field_name.length());

My guess is that for ID3 frame ID's you don't have accented characters and such, so that'll probably be all you need. 我的猜测是,对于ID3框架ID,您没有带重音的字符,因此,这可能就是您所需要的。 If you do have a possibility of accented characters and such, things get more complex in a hurry -- you'll need to convert from something like ISO 8859-x to (probably) UTF-16 Unicode. 如果您确实有加重字符的可能性,那么事情就会变得更加复杂-您需要将类似ISO 8859-x的内容转换为(可能是)UTF-16 Unicode。 To do that, you need a code-page that tells you how to interpret the input (ie, there are several varieties of ISO 8859, and one for French input will be different from one for Russian, for example). 为此,您需要一个代码页,告诉您如何解释输入(例如,ISO 8859有多种变体,例如,法语输入与俄文输入是不同的)。

In order to prevent large char values from becoming negative wchar_t values you need to make sure that you cast to unsigned. 为了防止大的char值变为wchar_t负值,您需要确保将其强制转换为unsigned。 This works though I believe it's technically undefined: 尽管我认为这在技术上是未定义的,但这是可行的:

unsigned char* uchar = reinterpret_cast<unsigned char*>(&vect[0]);

std::vector<wchar_t> vwchar(uchar, uchar + vect.size());

This is important if your text contains anything above 127 in the character set. 如果您的文本字符集中的值大于127,则这一点很重要。

Also keep in mind that none of these answer correctly deal with UTF-anything. 还请记住,这些答案都无法正确处理UTF-任何问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM