简体   繁体   English

检查UTF-8是wchar_t还是char?

[英]Check if UTF-8 is wchar_t or char?

I'm calling zlib API zipOpen which create a new zip file from my C++ project. 我正在调用zlib API zipOpen ,它从我的C ++项目创建了一个新的zip文件。 The function signature is extern zipFile ZEXPORT zipOpen (const char* pathname, int append) . 函数签名是extern zipFile ZEXPORT zipOpen (const char* pathname, int append)

This call eventually call fopen in order to create the file. 该调用最终会调用fopen来创建文件。 However, this function doesn't support wide characters, and I'd like to fix it by sending a UTF-8 format (that represented by char* and fit function signature) and before calling fopen check if the string contain non ascii characters, if no, call fopen as before. 但是,此函数不支持宽字符,我想通过发送UTF-8格式(由char *和fit函数签名表示)并在调用fopen之前检查字符串是否包含非ASCII字符来解决此问题,如果没有,请像以前一样调用fopen if yes, convert to wide string (wchar_t) and call _wfopen . 如果是,则转换为宽字符串(wchar_t)并调用_wfopen

So the question is if there's an C/C++ API that check if UTF-8 formatted string contain non ascii characters ? 所以问题是,是否有一个C / C ++ API检查UTF-8格式的字符串是否包含非ASCII字符?

Basically I'm looking for a function resembles to isWide in the example below. 基本上,我在下面的示例中寻找类似于isWide的函数。 I'd like to know whether to call fopen or _wfopen from Windows api with the string represented filename. 我想知道是否使用字符串表示的文件名从Windows api调用fopen_wfopen

    std::string toUTF8(std::wstring str)
    {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
        return converter.to_bytes(str));
    }
    ...
    ..
    .
    std::wstring s1 = L"おはよう";
    isWide(toUTF8(s1).c_str()); //this should return true.

    string s2 = "asdasd";
    isWide(s2); //this should return false. 

    std::wstring s3 = L"asdasd";
    isWide(toUTF8(s3)); //this should return false.

    for s in s1,s2,s3 do : //pseudo code, please forgive me :-) 
        if (isWide(toUTF8(s)))
            _wfopen(s,L"wb"); // create wide char file
        else
            fopen(s,"wb"); // create regular name file

and the function signature of isWide : 和isWide的功能签名:

bool isWide(char * s);

As stated in the comment below, a similar question was already asked before, but wasn't resolved with standard API. 如下面的评论所述,之前已经提出过类似的问题,但是标准API并未解决该问题。

thanks 谢谢

There's no reason to check whether or not there's any non-ASCII characters in the string. 没有理由检查字符串中是否有任何非ASCII字符。 If you know it's UTF-8 (note that ASCII is valid UTF-8,) just convert it and always call _wfopen() unconditionally. 如果您知道它是UTF-8(请注意ASCII是有效的UTF-8),只需对其进行转换并始终无条件调用_wfopen()

It depends on your definition of "wide". 这取决于您对“宽”的定义。 If you just want to test for the presence of non-ASCII characters, just test for the high bit: 如果只想测试是否存在非ASCII字符,则只需测试高位:

bool isWide(const char * s) {
  for (; *s; s++) {
    if (*s & 0x80)
      return true;
  }
  return false;
}

You can step through all characters and check if the most significant bit is "1". 您可以单步执行所有字符并检查最高有效位是否为“ 1”。 See: https://de.wikipedia.org/wiki/UTF-8 , only multibyte characters have that bit set. 请参阅: https : //de.wikipedia.org/wiki/UTF-8 ,只有多字节字符设置了该位。

bool isWide(const std::string& string) {    
    for(auto& c : string) 
    { 
        if(c & 0x80) {
            return true;
        } 
    }
    return false;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM