[英]Unicode Processing in C++
在C ++中进行Unicode处理的最佳实践是什么?
is_alpha
unless that is the definition you want. 确保你总是使用unicode库来处理字符串长度,大小写状态等普通任务。除非是你想要的定义,否则不要使用像is_alpha
这样的标准库内置is_alpha
。 string
if you care about correctness, always use your unicode library for this. 我不能说够了: 如果你关心正确性,永远不要遍历string
的索引,总是使用你的unicode库。 If you don't care about backwards compatibility with previous C++ standards, the current C++11 standard has built in Unicode support: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2011/n3242.pdf 如果您不关心与以前的C ++标准的向后兼容性,那么当前的C ++ 11标准内置了Unicode支持: http : //www.open-std.org/JTC1/SC22/WG21/docs/papers/2011 /n3242.pdf
So the truly best practice for Unicode processing in C++ would be to use the built in facilities for it. 因此,在C ++中进行Unicode处理的真正最佳实践是使用内置工具。 That isn't always a possibility with older code bases though, with the standard being so new at present. 然而,对于较旧的代码库,这并不总是可能的,目前标准是如此新颖。
EDIT: To clarify, C++11 is Unicode aware in that it now has support for Unicode literals and Unicode strings. 编辑:为了澄清,C ++ 11是Unicode识别的,因为它现在支持Unicode文字和Unicode字符串。 However, the standard library has only limited support for Unicode processing and conversion. 但是,标准库对Unicode处理和转换的支持有限 。 For your current needs this may be enough. 对于您目前的需求,这可能就足够了。 However, if you need to do a large amount of heavy lifting right now then you may still need to use something like ICU for more in-depth processing. 但是,如果您现在需要进行大量繁重的工作,那么您可能仍需要使用ICU之类的东西进行更深入的处理。 There are some proposals currently in the works to include more robust support for text conversion between different encodings. 有一些建议, 目前的作品 ,包括针对不同编码之间进行文本转换更强大的支持。 My guess (and hope) is that this will be part of the next technical report . 我的猜测(和希望)是这将成为下一份技术报告的一部分 。
Our company (and others) use the open source Internation Components for Unicode (ICU) library originally developed by Taligent. 我们公司(和其他公司)使用最初由Taligent开发的开源国际组件 (ICU)库。
It handles strings, locales, conversions, date/times, collation, transformations, et. 它处理字符串,区域设置,转换,日期/时间,整理,转换等。 al. 人。
Start with the ICU Userguide 从ICU用户指南开始
Here is a checklist for Windows programming: 这是Windows编程的清单:
Look at Case insensitive string comparison in C++ 在C ++中查看Case不敏感的字符串比较
That question has a link to the Microsoft documentation on Unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx 该问题有一个关于Unicode的Microsoft文档的链接: http : //msdn.microsoft.com/en-us/library/cc194799.aspx
If you look on the left-hand navigation side on MSDN next to that article, you should find a lot of information pertaining to Unicode functions. 如果您在该文章旁边的MSDN上查看左侧导航端,您应该找到许多与Unicode功能相关的信息。 It is part of a chapter on "Encoding Characters" ( http://msdn.microsoft.com/en-us/library/cc194786.aspx ) 它是“编码字符”一章的一部分( http://msdn.microsoft.com/en-us/library/cc194786.aspx )
It has the following subsections: 它有以下小节:
Although this may not be best practice for everyone, you can write your own C++ UNICODE routines if you want! 虽然这对每个人来说可能不是最佳实践,但如果需要,您可以编写自己的C ++ UNICODE例程!
I just finished doing it over a weekend. 我刚刚结束了一个周末。 I learned a lot, though I don't guarantee it's 100% bug free, I did a lot of testing and it seems to work correctly. 我学到了很多东西,虽然我不保证它100%没有bug,但我做了很多测试,似乎工作正常。
My code is under the New BSD license and can be found here: 我的代码在新BSD许可下,可在此处找到:
http://code.google.com/p/netwidecc/downloads/list http://code.google.com/p/netwidecc/downloads/list
It is called WSUCONV and comes with a sample main() program that converts between UTF-8, UTF-16, and Standard ASCII. 它被称为WSUCONV,带有一个示例main()程序,可在UTF-8,UTF-16和标准ASCII之间进行转换。 If you throw away the main code, you've got a nice library for reading / writing UNICODE. 如果你扔掉主代码,你就有了一个很好的读/写UNICODE库。
As has been said above a library is the best bet when using a large system. 如上所述,在使用大型系统时,库是最好的选择。 However some times you do want to handle things your self (maybe because the library would use to many resources like on a micro controller). 但有时候你确实想要自己处理事情(可能是因为库可以用于许多资源,比如在微控制器上)。 In this case you want a simple library that you can copy the parts out of for the things you actually need. 在这种情况下,您需要一个简单的库,您可以将这些部件复制出来以获取您实际需要的东西。
Willow Schlanger's example code seems like a good one (see his answer for more details). Willow Schlanger的示例代码看起来很好(有关详细信息,请参阅他的答案)。
I also found another one that has smaller code, but lacks full error checking and only handles UTF-8 but was simpler to take parts out of. 我还发现了另一个代码较小的代码,但是缺少完整的错误检查,只处理UTF-8,但更容易从中取出。
Here's a list of the embedded libraries that seem decent. 这是一个看似体面的嵌入式库列表。
看看UTF-8 Everywhere的建议
使用IBM的Unicode国际组件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.