如何在 u"..." 字符串中产生有意的编码错误？

Question

I am writing a UTF-16 decode routine.我正在编写一个 UTF-16 解码例程。 To check if it works correctly, I need to produce test strings with intentional encoding errors in them.为了检查它是否正常工作，我需要生成带有故意编码错误的测试字符串。 However, when I try to produce such strings in C the obvious way, the compiler rejects my code with “... is not a valid universal character:”但是，当我尝试以明显的方式在 C 中生成此类字符串时，编译器会拒绝我的代码，并显示“...不是有效的通用字符：”

u"\d800" /* unmatched low surrogate */
u"\dc01\d802" /* surrogates in wrong order */

How can I produce u"..." strings with intentional encoding errors?如何生成带有故意编码错误的u"..."字符串？

Answer 1

The \uXXXX and \UXXXXXXXX escape sequences can only encode valid universal characters. \uXXXX和\UXXXXXXXX转义序列只能编码有效的通用字符。 To encode other char16_t values, use a \x... escape sequence:要编码其他char16_t值，请使用\x...转义序列：

u"\xd800" /* unmatched low surrogate */
u"\xdc01\xd802" /* surrogates in wrong order */

如何在 u"..." 字符串中产生有意的编码错误？

问题描述

1 个解决方案

解决方案1
4 已采纳 2022-07-06 18:22:58

如何在 u&quot;...&quot; 字符串中产生有意的编码错误？

问题描述

1 个解决方案

解决方案1 4 已采纳 2022-07-06 18:22:58

如何在 u"..." 字符串中产生有意的编码错误？

解决方案1
4 已采纳 2022-07-06 18:22:58