简体   繁体   English

跨平台的换行符混乱

[英]Cross-platform newline confusion

For some reason, my write-to-textfile function stopped working all of a sudden. 由于某种原因,我的写入文本文件功能突然停止工作。

void write_data(char* filename, char* writethis)
{
    ofstream myfile;
    myfile.open (filename, std::ios_base::app);
    myfile << endl << writethis;
    myfile.close();
}

The function was called from a loop, so basically it started with an empty line and appended all the following "writethis" lines on a new line. 该函数是从循环中调用的,因此基本上它从空行开始,并将以下所有“ writethis”行附加在新行上。

Then all of a sudden, no more newlines. 然后突然之间,没有更多的换行符。 All text was appended on one single line. 所有文本都附加在一行上。 So I did some digging and I came across this: 所以我做了一些挖掘,发现了这一点:

  1. Windows = CR LF Windows = CR LF
  2. Linux = LF Linux = LF
  3. MAC < 0SX = CR MAC <0SX = CR

So I changed the line to 所以我把线改为

myfile << "\r\n" << writethis;

And it worked again. 并且再次起作用。 But now I'm confused. 但是现在我很困惑。 I am coding on linux but I am reading the textfiles created with the program out on windows after transferring them with filezilla . 我在linux上编码,但是在通过filezilla传输它们之后,我正在Windows上读取由该程序创建的文本文件。 Now which part of this caused the lines in the textfile to appear as one line? 现在,哪一部分导致文本文件中的行显示为一行?

I was pretty sure "endl" worked just fine for linux so now I'm thinking windows messed the file up after transferring them with filezilla? 我很确定“ endl”在Linux上能正常工作,所以现在我想Windows用filezilla传输文件后将文件弄乱了吗? Messing up the way the text file is written to (and read out) will guarantee my program to break, so if someone can explain this I'd appreciate it. 搞清楚文本文件的写入(和读出)方式将确保我的程序能够被打破,因此,如果有人可以解释这一点,我将不胜感激。

I also don't recall what I changed in my program to cause this to break, because it was working just fine earlier. 我还不记得我在程序中所做的更改导致此中断,因为它在早期运行良好。 The only thing I added was threading. 我添加的唯一一件事是线程。

Edit: I have tried swapping the transfer mode from ASCII / Binary (even removed the force-ASCII-for-txt-extension), but it makes no differences. 编辑:我曾尝试从ASCII /二进制交换传输模式(甚至删除了force-ASCII-for-txt-extension),但没有区别。 The newlines appear in linux, but not on windows. 换行符出现在linux中,但不在Windows中。 fz-messup

How odd. 真奇怪

What happens is that you write the Unix line endings ('\\n'), then transfer it to a Windows machine getting a bitwise identical file, then trying to open the file with a viewer that does not understand Unix line endings (Notepad likely). 发生的情况是,您编写了Unix行尾('\\ n'),然后将其传输到Windows计算机上,以获取按位相同的文件,然后尝试使用不理解Unix行尾的查看器打开文件(可能是记事本) 。

From my experience on writing portable code: 根据我在编写可移植代码方面的经验:

  • Standardize on ONE line-ending ( '\\n' , LF) on ALL platforms. 在所有平台上对一个行尾( '\\n' ,LF)进行标准化。
  • Always open your files in binary, even if you write text. 即使编写文本,也始终以二进制打开文件。
  • Let the user who opens the file use a text viewer that understands any line-endings. 让打开文件的用户使用理解任何行尾的文本查看器。 There are plenty for windows (including Visual Studio, Notepad++, Wordpad and your favorite browser). 有很多用于Windows的窗口(包括Visual Studio,Notepad ++,Wordpad和您喜欢的浏览器)。

Yes, I do think that there is more benefit to everybody to standardize on one thing rather than supporting all of them everywhere. 是的,我确实认为每个人都可以标准化件事而不是在所有人的支持下受益更多。 Also I deny the existence of "proper line endings on the proper platform". 我也否认“在适当的平台上存在适当的行尾”。 The fact that Microsoft decided that their native API does not speak UTF-8 or does not understand Unix line endings does not prevent everybody's code from doing that, on Windows. Microsoft决定其本机API不讲UTF-8或不理解Unix行尾的事实,并不能阻止在Windows上每个人的代码都这样做。 Just make sure not to pass this stuff to WinAPI. 只要确保不要将这些内容传递给WinAPI。 Many times you do text processing on your internal data that the system will not ever see, so why the hell do you need to complicate your life by meeting the expectations of those system's internals? 很多时候,您会对系统永远不会看到的内部数据进行文本处理,那么为什么您还要满足这些系统内部的期望,却使生活变得复杂?

endl does "work just fine for Linux". endl 确实 “对于Linux来说工作得很好”。 Streaming endl streams a \\n character and flushes the stream. endl流一个\\n字符并刷新流。 Always. 总是。

However, a file stream in text mode will convert this \\n to \\r\\n at the implementation layer on Windows, and you'll often find line endings being converted as you transfer the file between platforms, too. 但是,在Windows的实现层上,文本模式下的文件流会将\\n转换为\\r\\n ,并且在平台之间传输文件时,通常还会发现行尾被转换。

This is probably not a C++ problem, and nothing is "broken"; 这可能不是C ++问题,什么也没有“损坏”。 you should probably configure FileZilla to treat your file as text rather than " binary " (a mode in which line endings are not converted). 您应该将FileZilla配置为将文件视为文本而不是“ 二进制 ”(不转换行尾的模式)。 If your file has no name extension like ".txt" then it probably doesn't do this by default. 如果您的文件没有扩展名(如“ .txt”),则默认情况下可能不会这样做。

FTP can mess up your files (that is, it converts newlines) if you transfer files as ASCII. 如果您将文件传输为ASCII,FTP可能会弄乱您的文件(即,它会转换换行符)。 Try transfering as BIN (binary). 尝试以BIN(二进制)的形式传输。

Internally all applications use '\\n' to indicate line termination. 在内部,所有应用程序均使用'\\ n'表示行终止。

The problem is that the line termination sequence is platform specific for text files (as your research turned up) Note: Text files, this is the default format when you open a file. 问题在于,行终止顺序是特定于文本文件的平台(随着研究的进行)注:文本文件,这是打开文件时的默认格式。 If you explicitly select binary when opening a file no translation happens when reading/writing. 如果在打开文件时显式选择二进制,则在读/写时不会发生翻译。

What this actually means is that the '\\n' character is transformed into a platform specific sequence of character when you write it to a file. 这实际上意味着当您将\\ n字符写入文件时,它会转换为特定于平台的字符序列。 But also note that this platform specific sequence is converted back to '\\n' when the file is read. 但也请注意,读取文件时,此平台特定的序列将转换回'\\ n'。 The problem you are encountering is that you have written the files on one platform and read them on another. 您遇到的问题是您已在一个平台上编写了文件,并在另一个平台上读取了它们。

On linux the line termination sequence is LF ('\\n'). 在Linux上,行终止顺序为LF('\\ n')。 Thus you write the file and all '\\n' are converted into 'LF' characters. 这样,您写入文件,所有的'\\ n'都将转换为'LF'字符。 You transfer these files to a windows system and now read the file. 您将这些文件传输到Windows系统,然后读取文件。 On windows the line termination sequence is 'CRLF' So the editor that read the file is looking for two characters to convert back to '\\n' but does not find these characters. 在Windows上,行终止顺序为'CRLF',因此读取文件的编辑器正在寻找两个字符以转换回'\\ n',但找不到这些字符。 Now it depends on how smart the editor is as to whether you get a single line or multiple lines. 现在,这取决于编辑器是单行还是多行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM