字符编码独立字符交换

Question

I like to use this piece of code when I want to reverse a string. 当我想反转一个字符串时，我喜欢使用这段代码。 _{[When I am not using std::string or other inbuilt functions in C ]} . _{[当我不使用std::string或其他内置功能C ]。} As a beginner when I initially thought of this I had ASCII table in mind. 作为初学者，当我最初想到这一点时，我想到了ASCII表。 I think this can work well with Unicode too. 我认为这也可以与Unicode一起使用。 I assumed since the difference in values (ASCII etc) is fixed, so it works. 我假设由于值的差异（ASCII等）是固定的，因此可以正常工作。

Are there any character encodings in which this code may not work? 是否有任何字符编码无法在其中使用？

char a[11],t;
int len,i;
strcpy(a,"Particl");    
printf("%s\n",a);
len = strlen(a);
for(i=0;i<(len/2);i++)
{
    a[i] += a[len-1-i];
    a[len-1-i] = a[i] - a[len-1-i];
    a[i] -= a[len-1-i];
}
printf("%s\n",a);

_Update: _更新：

_{This link is informative in association with this question.} _{该链接与该问题相关，是信息丰富的。}

Answer 1

This will not work with any encoding in which some (not necessarily all) codepoints require more than one char unit to represent, because you are reversing byte-by-byte instead of codepoint-by-codepoint. 这不适用于某些（不一定是全部）代码点需要多个char单位表示的编码，因为您是逐字节地而不是逐个代码点地反转。 For the usual 8-bit char this includes all encodings that can represent all of Unicode. 对于通常的8位char这包括可以表示所有Unicode的所有编码。

For example: in UTF-16BE, the string "hello" maps to the byte sequence 00 68 00 65 00 6c 00 6c 00 6f . 例如：在UTF-16BE中，字符串“ hello”映射到字节序列00 68 00 65 00 6c 00 6c 00 6f 。 Your algorithm applied to this byte sequence will produce the sequence 6f 00 6c 00 6c 00 65 00 68 00 , which is the UTF-16BE encoding of the string "漀氀氀攀栀". 应用于此字节序列的算法将产生序列6f 00 6c 00 6c 00 65 00 68 00 ，这是字符串“漀氀氀攀栀”的UTF-16BE编码。

It gets worse -- doing a codepoint-by-codepoint reversal of a Unicode string still won't produce the correct results in all cases, because Unicode has many codepoints that act on their surroundings rather than standing alone as characters. 情况变得更糟-在所有情况下，对Unicode字符串逐个代码点反转仍然无法产生正确的结果，因为Unicode具有许多作用于周围环境的代码点，而不是单独作为字符。 As a trivial example, codepoint-reversing the string "Spın̈al Tap", which contains U+0308 COMBINING DIAERESIS, will produce "paT länıpS" -- see how the diaeresis has migrated from the N to the A? 举一个简单的例子，对包含“ U + 0308 COMBINING DIAERESIS”的字符串“Spın̈alTap”进行代码点反转将产生“ paTlänıpS”，请问透尿症如何从N迁移到A？ The consequences of codepoint-by-codepoint reversal on a string containing bidirectional overrides or conjoining jamo would be even more dire. 在包含双向覆盖或联合jamo的字符串上逐个代码点反转的后果将更加可怕。

字符编码独立字符交换

问题描述

1 个解决方案

解决方案1
9 已采纳 2013-05-14 15:32:28

字符编码独立字符交换

问题描述

1 个解决方案

解决方案1 9 已采纳 2013-05-14 15:32:28

解决方案1
9 已采纳 2013-05-14 15:32:28