简体   繁体   中英

DllImport - ANSI vs. Unicode

I have some questions about the possible answers for the test question bellow:

Question: You write the following code segment to call a function from the Win32 Application Programming Interface (API) by using platform invoke.

string personName = "N?el";
string msg = "Welcome" + personName + "to club"!";
bool rc = User32API.MessageBox(0, msg, personName, 0);

You need to define a method prototype that can best marshal the string data. Which code segment should you use?

// A.
[DllImport("user32", CharSet = CharSet.Ansi)]
public static extern bool MessageBox(int hWnd, string text, string caption, uint type);
}

// B.
[DllImport("user32", EntryPoint = "MessageBoxA", CharSet = CharSet.Ansi)]
public static extern bool MessageBox(int hWnd,
[MarshalAs(UnmanagedType.LPWStr)]string text,
[MarshalAs(UnmanagedType.LPWStr)]string caption, uint type);
}

// C. - Correct answer
[DllImport("user32", CharSet = CharSet.Unicode)]
public static extern bool MessageBox(int hWnd, string text, string caption, uint type);
}

// D.
[DllImport("user32", EntryPoint = "MessageBoxA", CharSet = CharSet.Unicode)]
public static extern bool MessageBox(int hWnd,
[MarshalAs(UnmanagedType.LPWStr)]string text,
[MarshalAs(UnmanagedType.LPWStr)]string caption,
uint type);
}

Why exactly is the correct answer C? Couldn't it just as well have been A? The only difference is that it would be ANSI instead of Unicode.

I understand that it couldn't be D because we choose Unicode as a character set and then have an ANSI function as an entrypoint.

Why wouldn't B work?

 string personName = "N?el";

This string was garbled by the exact problem this question is asking about. No doubt it looked like this in the original:

 string personName = "Nöel";

The ö tends to be a problem, it has a character code that is not in the ASCII character set and might not be supported by the default system code page. Which is what is used when you pinvoke the ANSI version of MessageBox, aka MessageBoxA. The real function is MessageBoxW, the one that takes a utf-16 encoded Unicode string.

MessageBoxA is a legacy function that was used in old versions of Windows, back in the olden days when programs still used 8-bit character strings. It isn't completely gone, lots of C and C++ programs still tend to be stuck with 8-bit encodings. MessagBoxA is implementing by converting the 8-bit encoded strings to Unicode and then calling MessageBoxW. With is slow and lossy if you had a Unicode string in the first place.

So rating the 4 versions:

A: uses MessageBoxA + 8-bit encoding, risky.
B: uses MessageBoxA + Unicode, fail.
C: uses MessageBoxW + Unicode, good
D: uses MessageBoxA + Unicode, fail.

CharSet.Ansi tells the marshaller to marshal as ANSI unless otherwise instructed. Likewise CharSet.Unicode is an instruction to marshal as UTF-16 unless otherwise instructed.

Since options B and D do indeed instruct otherwise, the CharSet parameter is overridden and so option B and D are in fact equivalent. They are both incorrect since you asked for the function named MessageBoxA which expects ANSI text.

That leaves A and C. Option A calls the ANSI variant of the function MessageBoxA and option C calls the Unicode variant, MessageBoxW . Behind the scenes the p/invoke marshaller picks the appropriate entry point using the value of the CharSet parameter.

Now, you could use either A or C, but the difference is just that with option A you will pass ANSI encoded text. And if text you pass contains characters that cannot be encoded in ANSI, there will be a loss of information. Which is why C is to be preferred. It will always receive the exact same text that exists in the .net calling code.

I suspect the answer is in the personName .

I don't think it has copy-pasted into your question properly.

string personName = "N?el";

Note the ? character. I think that indicates that the original string had a non-ANSI character there. If that was true, and you could see that properly, then it would indicate that you had to use Unicode rather than ANSI (hence the answer has to be C ).

In any case, Unicode would work with more formats than ANSI, so it's a better default choice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM