简体   繁体   English

std :: string,std :: wstring和UTF8

[英]std::string, std::wstring and UTF8

I want to use string encoded in the UTF-8 (I'm sorry if its a bad wording, please correct me so I understand what is a proper one). 我想使用以UTF-8编码的字符串(对不起,如果措辞不好,请更正我,以便我理解是正确的字符串)。 Also, I want my program to be cross-platform. 另外,我希望我的程序可以跨平台。

IIUC, the proper way to do so is to use std::wstring and then convert it to be UTF8. IIUC,这样做的正确方法是使用std :: wstring,然后将其转换为UTF8。 The trouble is that I think that on Linux std::string is already encoded in UTF8 (I may be wrong so). 问题是我认为在Linux上std :: string已经用UTF8编码了(我可能是错的)。

So what is the best way to create a UTF8 representation of std::{w}string with the least possible conditional code? 那么用最少的条件代码创建std :: {w} string的UTF8表示的最佳方法是什么?

The strings are constants, they are hard coded and they will be used in the SQLite queries. 字符串是常量,它们经过硬编码,将在SQLite查询中使用。

PS: I am going to try with XCode 5, hoping that it is C++11 compliant. PS:我将尝试使用XCode 5,希望它符合C ++ 11。

they are hard coded. 他们是硬编码的。

If all of the strings in question are hard-coded string literals, then you don't need anything special. 如果所有有问题的字符串都是硬编码的字符串文字,那么您不需要任何特殊的东西。

Use the u8 prefix when declaring such strings will ensure that they are encoded in UTF-8. 声明此类字符串时,请使用u8前缀,以确保它们以UTF-8编码。 On every platform that supports this feature of C++11. 在支持C ++ 11此功能的每个平台上。 The type of such strings is const char [] , just like a regular string literal: 此类字符串的类型为const char [] ,就像常规字符串文字一样:

const char my_utf8_literal[] = u8"Some String.";

Of course, these can be stored in std::string (not wstring ) as well: 当然,这些也可以存储在std::string (不是wstring )中:

std::string my_utf8_string = u8"Some String.";

You said that your goal was to use them in SQLite queries and commands. 您说过,您的目标是在SQLite查询和命令中使用它们。 In that case, it should be pretty easy to make everything work. 在这种情况下,使一切正常工作应该很容易。 You would be using SQLite's string formatting commands to build queries, and while they are blind to UTF-8, so long as all of your inputs are UTF-8, the outputs will also be valid UTF-8. 您将使用SQLite的字符串格式化命令来构建查询,尽管它们对UTF-8不敏感,但是只要您所有的输入都是UTF-8,输出也将是有效的UTF-8。 So there shouldn't be any problems. 因此,应该没有任何问题。

For UTF-8 processing there's a Library called tiny-utf8 . 对于UTF-8处理,有一个名为tiny-utf8的库。 It provides a drop-in replacement for std::string or more specifically std::u32string (::value_type is char32_t, but data representation is utf8 with char 's). 它提供了std :: string或更具体的std :: u32string的直接替换(:: value_type是char32_t,但是数据表示形式是带有char的utf8)。 That's more or less the easiest way to handle utf8 in C++11. 这或多或少是C ++ 11中处理utf8的最简单方法。

The strings are constants, they are hard coded and they will be used in the SQLite queries. 字符串是常量,它们经过硬编码,将在SQLite查询中使用。

If you have hardcoded strings, you would just have to change the encoding of your source file to UTF8 and prepend the U -prefix to your string literal, with which you can then construct an utf8_string class to work with it. 如果您对字符串进行了硬编码,则只需将源文件的编码更改为UTF8,并在字符串文字前添加U前缀,然后您就可以使用该前缀构造一个utf8_string类来使用它。

So what is the best way to create a UTF8 representation of std::{w}string with the least possible conditional code? 那么用最少的条件代码创建std :: {w} string的UTF8表示的最佳方法是什么?

IMHO If you are able to, don't work with wchar_t and wstring, since they are probably the most vaguely specified and platform specific things in the C++ string library. 恕我直言,如果可以的话,请不要使用wchar_t和wstring,因为它们可能是C ++字符串库中指定最模糊且特定于平台的内容。

I hope this helped at least a Little bit. 我希望这至少可以有所帮助。

Cheers, Jakob 雅各布干杯

The question has changed after this answer was posted, adding that the strings are hardcoded literals to be used in SQL queries. 发布此答案后,问题已更改,并补充说字符串是要在SQL查询中使用的硬编码文字。 For that simple u8 strings are a simple solution, and parts answered here become irrelevant. 因为简单的u8字符串是一个简单的解决方案,所以这里回答的部分变得无关紧要。 I'm not going to chase the question through this or further changes. 我不会通过此更改或其他更改来解决这个问题。

Re 回覆

I want to use string encoded in the UTF-8 (I'm sorry if its a bad wording, please correct me so I understand what is a proper one). ”“我想使用UTF-8编码的字符串(对不起,如果措辞不好,请更正我,以便我理解是正确的字符串)。 Also, I want my program to be cross-platform. 另外,我希望我的程序可以跨平台。

Then you're plain out of luck. 那你就很不走运。

Microsoft's documentation explicitly states that their setlocale does not support UTF-8: Microsoft的文档明确声明其setlocale不支持UTF-8:

MSDN docs on setlocale : 有关setlocale MSDN文档:

The set of available locale names, languages, country/region codes, and code pages includes all those supported by the Windows NLS API except code pages that require more than two bytes per character, such as UTF-7 and UTF-8. 可用的语言环境名称,语言,国家/地区代码和代码页集包括Windows NLS API支持的所有语言设置,但每个字符需要两个以上字节的代码页(例如UTF-7和UTF-8)除外。 If you provide a code page value of UTF-7 or UTF-8, setlocale will fail, returning NULL . 如果提供的代码页值为UTF-7或UTF-8,则setlocale将失败,并返回NULL


Heads-up: in spite of the fact that It Does Not Work™, and is explicitly documented as not working, there are numerous web sites and blogs, probably even books, that recommend the approach, in a sort of ostrich-like way. 注意:尽管它不起作用™,并且被明确记录为不起作用,但仍有许多网站和博客,甚至书籍,都以类似鸵鸟的方式推荐了这种方法。 They often look authoritative. 他们通常看起来很权威。 But the info is rubbish. 但是这些信息是垃圾。


Re 回覆

what is the best way to create a UTF8 representation of std::{w}string with the least possible conditional code? 以最少的条件代码创建std :: {w}字符串的UTF8表示的最佳方法是什么?

That depends on what you have available. 那要看你有什么。 The standard library offers std::codecvt . 标准库提供std::codecvt It's been asked about and answered before, eg ( Convert wstring to string encoded in UTF-8 ). 之前曾有人问及过它,例如( 将wstring转换为UTF-8编码的字符串 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM