简体   繁体   中英

Check if UTF-8 is wchar_t or char?

I'm calling zlib API zipOpen which create a new zip file from my C++ project. The function signature is extern zipFile ZEXPORT zipOpen (const char* pathname, int append) .

This call eventually call fopen in order to create the file. However, this function doesn't support wide characters, and I'd like to fix it by sending a UTF-8 format (that represented by char* and fit function signature) and before calling fopen check if the string contain non ascii characters, if no, call fopen as before. if yes, convert to wide string (wchar_t) and call _wfopen .

So the question is if there's an C/C++ API that check if UTF-8 formatted string contain non ascii characters ?

Basically I'm looking for a function resembles to isWide in the example below. I'd like to know whether to call fopen or _wfopen from Windows api with the string represented filename.

    std::string toUTF8(std::wstring str)
    {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
        return converter.to_bytes(str));
    }
    ...
    ..
    .
    std::wstring s1 = L"おはよう";
    isWide(toUTF8(s1).c_str()); //this should return true.

    string s2 = "asdasd";
    isWide(s2); //this should return false. 

    std::wstring s3 = L"asdasd";
    isWide(toUTF8(s3)); //this should return false.

    for s in s1,s2,s3 do : //pseudo code, please forgive me :-) 
        if (isWide(toUTF8(s)))
            _wfopen(s,L"wb"); // create wide char file
        else
            fopen(s,"wb"); // create regular name file

and the function signature of isWide :

bool isWide(char * s);

As stated in the comment below, a similar question was already asked before, but wasn't resolved with standard API.

thanks

There's no reason to check whether or not there's any non-ASCII characters in the string. If you know it's UTF-8 (note that ASCII is valid UTF-8,) just convert it and always call _wfopen() unconditionally.

It depends on your definition of "wide". If you just want to test for the presence of non-ASCII characters, just test for the high bit:

bool isWide(const char * s) {
  for (; *s; s++) {
    if (*s & 0x80)
      return true;
  }
  return false;
}

You can step through all characters and check if the most significant bit is "1". See: https://de.wikipedia.org/wiki/UTF-8 , only multibyte characters have that bit set.

bool isWide(const std::string& string) {    
    for(auto& c : string) 
    { 
        if(c & 0x80) {
            return true;
        } 
    }
    return false;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM