简体   繁体   中英

wchar_t is unsigned or signed

In this link unsigned wchar_t is typedef ed as WCHAR . But I cant find this kind of typedef in my SDK winnt.h or mingw winnt.h .

wchar_t is signed or unsigned?

I am using WINAPIs in C language.

The signedness of wchar_t is unspecified. The standard only says (3.9.1/5):

Type wchar_t shall have the same size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying type .

(By contrast, the types char16_t and char32_t are expressly unsigned.)

Be aware the type will vary in length by platform.

Windows uses UTF-16 and a wchar_t is 2 bytes. Linux uses a 4 byte wchar_t.

The standard may not specify whether wchar_t is signed or unsigned, but Microsoft does. Even if your non-Microsoft compiler disagrees, the Windows API will be using this definition from /Zc:wchar_t (wchar_t Is Native Type) :

Microsoft implements wchar_t as a two-byte unsigned value. It maps to the Microsoft-specific native type __wchar_t .

Type WCHAR, not wchar_t, is defined on MSDN as the following:

   #if !defined(_NATIVE_WCHAR_T_DEFINED)
    typedef unsigned short WCHAR;
    #else
    typedef wchar_t WCHAR;
    #endif

https://docs.microsoft.com/en-us/windows/win32/extensible-storage-engine/wchar

So you could conclude that its defined as unsigned on windows?

I just tested on several platforms, with no optimisation.

1) MinGW (32-bit) + gcc 3.4.4:
---- snip ----
#include<stdio.h>
#include<wchar.h>
const wchar_t BOM = 0xFEFF;
int main(void)
{
    int c = BOM;
    printf("0x%08X\n", c+0x1000);
    return 0;
}
---- snip ----

It prints 0x00010EFF . wchar_t is unsigned. Corresponding assembly code says movzwl _BOM, %eax . Not movSwl , but movZwl .

2) FreeBSD 11.2 (64-bit) + clang 6.0.0:
---- snip ----
#include<stdio.h>
#include<wchar.h>
const wchar_t INVERTED_BOM = 0xFFFE0000;
int main(void)
{
     long long c = INVERTED_BOM;
     printf("0x%016llX\n", c+0x10000000LL);
     return 0;
}
---- snip ----

It prints 0x000000000EFF0000 . wchar_t is signed. Corresponfing assembly code says, movq $-131072, -16(%rbp) . The 32-bit 0xFFFE0000 is promoted to 64-bit signed -131072 .

3) Same code as 2), on RedHat (version unknown) + gcc 4.4.7: It again prints 0x000000000EFF0000 . wchar_t is signed.

I tested neither the printf 's implementation nor WinAPI's WCHAR definition, but the behaviors of compiler-builtin wchar_t type (no specification about its signedness on any header file) and C-to-ASM compiler engine.

Note that the compilers on 1) and 3) are provided by the same vendor, namely the GNU Project. The answer definitely depends on platforms. (Would somebody test on Visual C++?)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM