C ++-Unicode换行符

Question

I'm having an increasingly frustrating problem in that I'm seemingly unable to print a unicode character (in this case, some braille dots), take it to a newline, and enter more braille dots. 我遇到了一个越来越令人沮丧的问题，因为我似乎无法打印unicode字符（在这种情况下，是一些盲文点），将其带到换行符，然后输入更多的盲文点。 I've been looking for answers for a few hours now, and I'm about at my wit's end. 我一直在寻找答案已有几个小时，而我快要结束了。

I've tried changing the format for my Unicode characters, changing localities, changing the order, using multiple fstreams, one wide and one normal, and using countless different supposed unicode newline escape sequences. 我尝试过更改Unicode字符的格式，更改位置，更改顺序，使用多个fstream，一个宽和一个法线以及使用无数种不同的假定unicode换行符转义序列。 This is repeated as many times as there are characters in a row. 重复此操作的次数与连续的字符数相同。 At the end of each row, it'll need to have an endline at the end. 在每行的末尾，都需要在末尾有一个终点。

wout.open((inputstring + "2.txt"), wofstream::binary | wofstream::trunc); //this only happens once


_setmode(_fileno(stdout), _O_U16TEXT);



switch (i) //will be expanded for more cases
{
case (63):
    cout << "\xFF\xFE"; // UTF-16 BOM
    cout << "\x0A\x28";

}



_setmode(_fileno(stdout), _O_TEXT);

I'm using setmode to switch to and from U16 because other parts of the program use text mode. 我正在使用setmode来回切换U16，因为该程序的其他部分使用文本模式。 If this is a problem, I can work around it. 如果有问题，我可以解决。 When I tried to use 当我尝试使用

    wout << "\n";

at the end of each row, it changes the output to be half braille characters like I'd expect, half gibberish like "＊૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ". 在每行的末尾，它将输出更改为半盲文字符，如我所期望的那样，半乱码，例如“ ＊૾H૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ૾Ｈ”。 When I remove any part to do with printing the braille characters, it prints newlines just fine. 当我删除任何与打印盲文字符有关的部分时，它会打印换行符。 I'm at a loss. 我很茫然。

Answer 1

The entire file is either 8-bit or 16-bit characters, as determined by the UTF-16 BOM in the first two bytes. 整个文件是8位或16位字符，由前两个字节的UTF-16 BOM确定。 You can't change between them. 您不能在它们之间进行更改。 When you write out an 8-bit newline character, that throws off the processing on the rest of the file, as that 8-bit character is combined with the next byte in the file to create a 16-bit character. 当您写出一个8位换行符时，该8位字符将与文件中的下一个字节组合在一起以创建一个16位字符，这将导致文件其余部分的处理中断。

If we look at the first few words of your misprinted text string, we have 如果我们查看打印错误的文本字符串的前几个单词，

0020 0022 ff0a 0afe ff28 0afe ff28 0afe

In the (little endian) binary file, these would be ordered as 在（小尾数）二进制文件中，这些命令将按以下顺序排序：

20 00 22 00 0a ff fe 0a 28 ff fe 0a 28 ff fe 0a

and you can see how that one byte newline combines with the following two byte characters to make unexpected output. 您会看到一个字节的换行符与以下两个字节的字符如何组合以产生意外的输出。

The fix is to always write 16-bit characters to the file. 解决方法是始终将16位字符写入文件。

C ++-Unicode换行符

问题描述

1 个解决方案

解决方案1
3 2018-11-01 23:34:03

C ++-Unicode换行符

问题描述

1 个解决方案

解决方案1 3 2018-11-01 23:34:03

解决方案1
3 2018-11-01 23:34:03