How to produce intentional encoding errors in u"..." strings?

Question

I am writing a UTF-16 decode routine. To check if it works correctly, I need to produce test strings with intentional encoding errors in them. However, when I try to produce such strings in C the obvious way, the compiler rejects my code with “... is not a valid universal character:”

u"\d800" /* unmatched low surrogate */
u"\dc01\d802" /* surrogates in wrong order */

How can I produce u"..." strings with intentional encoding errors?

Answer 1

The \uXXXX and \UXXXXXXXX escape sequences can only encode valid universal characters. To encode other char16_t values, use a \x... escape sequence:

u"\xd800" /* unmatched low surrogate */
u"\xdc01\xd802" /* surrogates in wrong order */

How to produce intentional encoding errors in u"..." strings?

Question

1 answers

solution1
4 ACCPTED 2022-07-06 18:22:58

How to produce intentional encoding errors in u"..." strings?

Question

1 answers

solution1 4 ACCPTED 2022-07-06 18:22:58

solution1
4 ACCPTED 2022-07-06 18:22:58