简体   繁体   English

如何在 u"..." 字符串中产生有意的编码错误?

[英]How to produce intentional encoding errors in u"..." strings?

I am writing a UTF-16 decode routine.我正在编写一个 UTF-16 解码例程。 To check if it works correctly, I need to produce test strings with intentional encoding errors in them.为了检查它是否正常工作,我需要生成带有故意编码错误的测试字符串。 However, when I try to produce such strings in C the obvious way, the compiler rejects my code with “... is not a valid universal character:”但是,当我尝试以明显的方式在 C 中生成此类字符串时,编译器会拒绝我的代码,并显示“...不是有效的通用字符:”

u"\d800" /* unmatched low surrogate */
u"\dc01\d802" /* surrogates in wrong order */

How can I produce u"..." strings with intentional encoding errors?如何生成带有故意编码错误的u"..."字符串?

The \uXXXX and \UXXXXXXXX escape sequences can only encode valid universal characters. \uXXXX\UXXXXXXXX转义序列只能编码有效的通用字符。 To encode other char16_t values, use a \x... escape sequence:要编码其他char16_t值,请使用\x...转义序列:

u"\xd800" /* unmatched low surrogate */
u"\xdc01\xd802" /* surrogates in wrong order */

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM