I am writing a UTF-16 decode routine. To check if it works correctly, I need to produce test strings with intentional encoding errors in them. However, when I try to produce such strings in C the obvious way, the compiler rejects my code with “... is not a valid universal character:”
u"\d800" /* unmatched low surrogate */
u"\dc01\d802" /* surrogates in wrong order */
How can I produce u"..."
strings with intentional encoding errors?
The \uXXXX
and \UXXXXXXXX
escape sequences can only encode valid universal characters. To encode other char16_t
values, use a \x...
escape sequence:
u"\xd800" /* unmatched low surrogate */
u"\xdc01\xd802" /* surrogates in wrong order */
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.