[英]Check if UTF-8 is wchar_t or char?
I'm calling zlib API zipOpen
which create a new zip file from my C++ project. 我正在调用zlib API
zipOpen
,它从我的C ++项目创建了一个新的zip文件。 The function signature is extern zipFile ZEXPORT zipOpen (const char* pathname, int append)
. 函数签名是
extern zipFile ZEXPORT zipOpen (const char* pathname, int append)
。
This call eventually call fopen
in order to create the file. 该调用最终会调用
fopen
来创建文件。 However, this function doesn't support wide characters, and I'd like to fix it by sending a UTF-8 format (that represented by char* and fit function signature) and before calling fopen
check if the string contain non ascii characters, if no, call fopen
as before. 但是,此函数不支持宽字符,我想通过发送UTF-8格式(由char *和fit函数签名表示)并在调用
fopen
之前检查字符串是否包含非ASCII字符来解决此问题,如果没有,请像以前一样调用fopen
。 if yes, convert to wide string (wchar_t) and call _wfopen
. 如果是,则转换为宽字符串(wchar_t)并调用
_wfopen
。
So the question is if there's an C/C++ API that check if UTF-8 formatted string contain non ascii characters ? 所以问题是,是否有一个C / C ++ API检查UTF-8格式的字符串是否包含非ASCII字符?
Basically I'm looking for a function resembles to isWide
in the example below. 基本上,我在下面的示例中寻找类似于
isWide
的函数。 I'd like to know whether to call fopen
or _wfopen
from Windows api with the string represented filename. 我想知道是否使用字符串表示的文件名从Windows api调用
fopen
或_wfopen
。
std::string toUTF8(std::wstring str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
return converter.to_bytes(str));
}
...
..
.
std::wstring s1 = L"おはよう";
isWide(toUTF8(s1).c_str()); //this should return true.
string s2 = "asdasd";
isWide(s2); //this should return false.
std::wstring s3 = L"asdasd";
isWide(toUTF8(s3)); //this should return false.
for s in s1,s2,s3 do : //pseudo code, please forgive me :-)
if (isWide(toUTF8(s)))
_wfopen(s,L"wb"); // create wide char file
else
fopen(s,"wb"); // create regular name file
and the function signature of isWide : 和isWide的功能签名:
bool isWide(char * s);
As stated in the comment below, a similar question was already asked before, but wasn't resolved with standard API. 如下面的评论所述,之前已经提出过类似的问题,但是标准API并未解决该问题。
thanks 谢谢
There's no reason to check whether or not there's any non-ASCII characters in the string. 没有理由检查字符串中是否有任何非ASCII字符。 If you know it's UTF-8 (note that ASCII is valid UTF-8,) just convert it and always call
_wfopen()
unconditionally. 如果您知道它是UTF-8(请注意ASCII是有效的UTF-8),只需对其进行转换并始终无条件调用
_wfopen()
。
It depends on your definition of "wide". 这取决于您对“宽”的定义。 If you just want to test for the presence of non-ASCII characters, just test for the high bit:
如果只想测试是否存在非ASCII字符,则只需测试高位:
bool isWide(const char * s) {
for (; *s; s++) {
if (*s & 0x80)
return true;
}
return false;
}
You can step through all characters and check if the most significant bit is "1". 您可以单步执行所有字符并检查最高有效位是否为“ 1”。 See: https://de.wikipedia.org/wiki/UTF-8 , only multibyte characters have that bit set.
请参阅: https : //de.wikipedia.org/wiki/UTF-8 ,只有多字节字符设置了该位。
bool isWide(const std::string& string) {
for(auto& c : string)
{
if(c & 0x80) {
return true;
}
}
return false;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.