What is the point of the UTF-8 character literals proposed for C++17?

Question

What exactly is the point of these as proposed by N4267 ?

Their only function seems to be to prevent extended ASCII characters or partial UTF-8 code points from being specified. They still store in a fixed-width 8-bit char (which, as I understand it, is the correct and best way to handle UTF-8 anyway for almost all use cases), so they don't support non-ASCII characters at all. What is going on?

(Actually I'm not entirely sure I understand the need for UTF-8 string literals either. I guess it's the worry of compilers doing weird/ambiguous things with Unicode strings coupled with validation of the Unicode?)

Answer 1

The rationale is covered in by the Evolution Working Group issue 119: N4197 Adding u8 character literals, [tiny] Why no u8 character literals? which tracked the proposal and says:

We have five encoding-prefixes for string-literals (none, L, u8, u, U) but only four for character literals -- the missing one is u8 for character literals.

This matters for implementations where the narrow execution character set is not ASCII. In such a case, u8 character literals would provide an ideal way to write character literals with guaranteed ASCII encoding (the single-code-unit u8 encodings are exactly ASCII), but... we don't provide them. Instead, the best one can do is something like this:
 char x_ascii = { u'x' }; 
... where we'll get a narrowing error if the codepoint doesn't fit in a 'char'. (Note that this is not quite the same as u8'x', which would give us an error if the codepoint was not representable as a single code unit in UTF-8.)

What is the point of the UTF-8 character literals proposed for C++17?

Question

1 answers

solution1
18 ACCPTED 2015-08-12 16:04:21

What is the point of the UTF-8 character literals proposed for C++17?

Question

1 answers

solution1 18 ACCPTED 2015-08-12 16:04:21

solution1
18 ACCPTED 2015-08-12 16:04:21