Python二进制文件？到文本/字符串

Question

I'm trying to parse a possibly binary file as text/strings in Python. 我正在尝试将可能的二进制文件解析为Python中的文本/字符串。 I'm not positive of the file format, so I'm assuming it's binary. 我对文件格式不满意，所以我假设它是二进制的。 Basically, it is an exported key (*.reg) from MS regedit . 基本上，它是MS regedit导出的密钥（* .reg）。 If I open the key in Notepad++ I can read it easily. 如果在Notepad++打开密钥，则可以轻松阅读。 However, if I try to iterate the lines in python (specifically iPython Notebook) it prints gobbledygook. 但是，如果我尝试迭代python（特别是iPython Notebook）中的行，则会打印出gobbledygook。 Here's a sample: 这是一个示例：

InFile = open("F:\Uninstallkey.reg","r")

for line in InFile:
    print "%r" % (line)

InFile.close()

Output: 输出：

'\xff\xfeW\x00i\x00n\x00d\x00o\x00w\x00s\x00 \x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00y\x00 \x00E\x00d\x00i\x00t\x00o\x00r\x00 \x00V\x00e\x00r\x00s\x00i\x00o\x00n\x00 \x005\x00.\x000\x000\x00\r\x00\n'
'\x00\r\x00\n'
'\x00[\x00H\x00K\x00E\x00Y\x00_\x00L\x00O\x00C\x00A\x00L\x00_\x00M\x00A\x00C\x00H\x00I\x00N\x00E\x00\\\x00S\x00O\x00F\x00T\x00W\x00A\x00R\x00E\x00\\\x00M\x00i\x00c\x00r\x00o\x00s\x00o\x00f\x00t\x00\\\x00W\x00i\x00n\x00d\x00o\x00w\x00s\x00\\\x00C\x00u\x00r\x00r\x00e\x00n\x00t\x00V\x00e\x00r\x00s\x00i\x00o\x00n\x00\\\x00U\x00n\x00i\x00n\x00s\x00t\x00a\x00l\x00l\x00]\x00\r\x00\n'
'\x00\r\x00\n'

In notepad++: 在记事本++中：

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall]

Strangely, in iPython it prints properly: 奇怪的是，在iPython中它可以正确打印：

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall]

Long story short, how can I tell the file format and how can I convert the file so I can print/parse it as text? 长话短说，我如何分辨文件格式以及如何转换文件，以便可以将其打印/解析为文本？

Answer 1

As mentioned above the file is in utf-16. 如上所述，该文件位于utf-16中。 Here's an easy way to open files with encoding: 这是一种使用编码打开文件的简单方法：

import codecs
InFile = codecs.open(path_to_registry, encoding='utf-16')

Answer 2

The file appears to be a plain text file that is simply encoded in little-endian UTF-16 . 该文件似乎是简单地在小尾数编码的纯文本文件UTF-16 。 Instead of using the normal open function, open the file with io.open with an encoding argument of "UTF-16LE" . 代替使用常规的open功能，而是使用io.open和"UTF-16LE" encoding参数打开文件。

Python二进制文件？到文本/字符串

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-06-30 18:01:06

解决方案2
2 2014-06-30 17:54:35

Python二进制文件？ 到文本/字符串

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-06-30 18:01:06

解决方案2 2 2014-06-30 17:54:35

Python二进制文件？到文本/字符串

解决方案1
3 已采纳 2014-06-30 18:01:06

解决方案2
2 2014-06-30 17:54:35