简体   繁体   中英

How to produce intentional encoding errors in u"..." strings?

I am writing a UTF-16 decode routine. To check if it works correctly, I need to produce test strings with intentional encoding errors in them. However, when I try to produce such strings in C the obvious way, the compiler rejects my code with “... is not a valid universal character:”

u"\d800" /* unmatched low surrogate */
u"\dc01\d802" /* surrogates in wrong order */

How can I produce u"..." strings with intentional encoding errors?

The \uXXXX and \UXXXXXXXX escape sequences can only encode valid universal characters. To encode other char16_t values, use a \x... escape sequence:

u"\xd800" /* unmatched low surrogate */
u"\xdc01\xd802" /* surrogates in wrong order */

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM