简体   繁体   English

Ruby中的通用换行支持,包括\ r(CR)行结尾

[英]Universal newline support in Ruby that includes \r (CR) line endings

In a Rails app, I'm accepting and parsing CSV files that may come formatted with any of three possible line termination characters: \\n ( LF ), \\r\\n ( CR+LF ), or \\r ( CR ). 在Rails应用程序中,我接受并解析可能使用三种可能的行终止字符格式化的CSV文件: \\nLF ), \\r\\nCR+LF )或\\rCR )。 Ruby's File and CSV libraries seem to handle the first two cases just fine, but the last case ("Mac classic" \\r line endings) isn't handled as a newline. Ruby的FileCSV库似乎可以正常处理前两种情况,但最后一种情况(“Mac classic” \\r行结尾)不作为换行处理。 It's important to be able to accept this format as well as the others, since Microsoft Excel for Mac (running on OS X) seems to use it when exporting to "Comma Separated Values" (although exporting to "Windows Comma Separated" produces the easier-to-handle \\r\\n ). 能够接受这种格式以及其他格式非常重要,因为Microsoft Excel for Mac(在OS X上运行)似乎在导出到“逗号分隔值”时使用它(尽管导出到“Windows逗号分隔”使得更容易处理\\r\\n )。

Python has "universal newline support" and will handle any of these three formats without a problem. Python具有“通用换行支持”,可以毫无问题地处理这三种格式中的任何一种。 Is there something similar in Ruby that will accept all three without knowing the format in advance? Ruby中是否有类似的东西可以在不事先知道格式的情况下接受所有这三个?

You could use :row_sep => :auto : 你可以使用:row_sep => :auto

:row_sep :row_sep
The String appended to the end of each row. 字符串附加到每行的末尾。 This can be set to the special :auto setting, which requests that CSV automatically discover this from the data. 这可以设置为特殊:auto设置,请求CSV自动从数据中发现此信息。 Auto-discovery reads ahead in the data looking for the next "\\r\\n" , "\\n" , or "\\r" sequence. 自动发现在数据中预先读取下一个"\\r\\n""\\n""\\r"序列。

There are some caveats of course, see the manual linked to above for details. 当然有一些注意事项,请参阅上面链接的手册了解详情。

You could also manually clean up the EOLs with a bit of gsub ing before handing the data to CSV for parsing. 在将数据传递给CSV进行解析之前,您还可以使用一些gsub手动清理EOL。 I'd probably take this route and manually convert all \\r\\n s and \\r s to single \\n s before attempting to parse the CSV. 在尝试解析CSV之前,我可能会采用这种方法并手动将所有\\r\\n s和\\r \\r\\n转换为单个\\n s。 OTOH, this won't work that well if there is embedded binary data in your CSV where \\r s mean something. OTOH,如果你的CSV中有嵌入的二进制数据,那么这将无法正常工作,其中\\r s意味着什么。 On the gripping hand, this is CSV we're dealing with so who knows what sort of crazy broken nonsense you'll end up dealing with. 在抓狂的手上,这是我们正在处理的CSV,所以谁知道你最终会处理什么样的疯狂废话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM