在Ruby on Rails上读取文件时，两个相等的XML具有差异

Question

I have to parse XML files coming from two different software. 我必须解析来自两个不同软件的XML文件。 One of the files failed on the parsing process. 其中一个文件在解析过程中失败。 So I started debugging the problem and I come to the point where I copied the "good file" content and paste it to the "bad file". 因此，我开始调试问题，现在我复制了“好文件”内容并将其粘贴到“坏文件”。 But the error persisted! 但是错误仍然存在！ I also paste the "bad file" content into the good one, and everything worked! 我还将“坏文件”内容粘贴到好文件中，一切正常！

I think that this has to do with some encoding problem. 我认为这与某些编码问题有关。

If an XML file has no encoding declared, is there some metadata that I could be missing? 如果XML文件未声明编码，是否有一些我可能会丢失的元数据？

The output when I read the file on ruby 当我在ruby上读取文件时的输出

File.read(Rails.root.join('bad-file.xml'))

\xFF\xFE<\u0000f\u0000i\u0000l\u0000e\u0000>\u0000\r\u0000<\u0000A\u0000L\u0000L\u0000_\u0000I\u0000N\u0000S\u0000T\u0000A\u0000N\u0000C\u0000E\u0000S\u0000>\u0000\r\u0000\r\u0000<\u0000i\u0000n\u0000s\u0000t\u0000a\u0000n\u0000c\u0000e\u0000>\u0000\r\u0000<\u0000I\u0000D\u0000>\u00009\u00005\u00003\u0000<\u0000/\u0000I\u0000D\u0000>\u0000\r\u0000<\u0000s\u0000t\u0000a\u0000r\u0000t\u0000>\u00005\u00000\u00005\u00009\u0000.\u00002\u00006\u00002\u00002\u00000\u00001....

File.read(Rails.root.join('good-file.xml'))

<file>\r\n<ALL_INSTANCES>\r\n\r\n<instance>\r\n<ID>953</ID>\r\n<start>5059.2622016567</start>\r\n<end>5060.2622016567</end>\r\n<code>timer-1sec</code>\r\n<label>\r\n<group>result</group>\r\n<text>Dabang Eindringen SK</text>\r\n</label>\r\n</instance>\r\n</ALL_INSTANCES>\r\n\r\n<ROWS>\r\n<row>\r\n<code>timer-1sec</code>\r\n<R>0</R>\r\n<G>0</G>\r\n<B>0</B>\r\n</row>\r\n</ROWS>\r\n</file>

Answer 1

Those first 2 bytes \\xFF\\xFE are a unicode byte order mark - they signify that the rest of the data is UTF16, in little endian order 前两个字节\\xFF\\xFE是unicode字节顺序标记-它们表示其余数据为UTF16（以小尾数顺序）

If you do 如果你这样做

File.read(path, mode: 'r:UTF-16LE')

Then the external encoding for the file will be set to that. 然后，文件的外部编码将被设置为该值。 The data is transcoded to the default internal encoding before being returned. 数据在返回之前已转换为默认的内部编码。 You can force that to utf-8 by doing 您可以通过以下方式将其强制为utf-8：

File.read(path, mode: 'r:UTF-16LE:UTF-8')

在Ruby on Rails上读取文件时，两个相等的XML具有差异

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-31 14:43:23

在Ruby on Rails上读取文件时，两个相等的XML具有差异

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-31 14:43:23

解决方案1
1 已采纳 2015-03-31 14:43:23