如何解碼從文件讀取的字符串？

Question

我將文件讀入Python的字符串中，並顯示為已編碼（不確定編碼）。

query = ""
with open(file_path) as f:
 for line in f.readlines():
   print(line)
   query += line
query

所有行均按預期以英文打印

select * from table

但最后的查詢顯示為

ÿþd\x00r\x00o\x00p\x00 \x00t\x00a\x00b\x00l\x00e\x00

這是怎么回事？

Answer 1

好像是UTF-16數據。 您可以嘗試使用utf-16對其進行解碼嗎？

with open(file_path) as f:
    query=f.decode('utf-16')
print(query)

Answer 2

經Carlos同意，編碼似乎為UTF-16LE。 BOM似乎存在，因此encoding="utf-16"將能夠自動檢測出是小端還是大端。

慣用的Python將是：

with open(file_path, encoding="...") as f:
    for line in f:
        # do something with this line

在您的情況下，您可以將每行追加到查詢中，因此整個代碼可以簡化為：

query = open(file_path, encoding="...").read()

Answer 3

with open(filePath) as f:
    fileContents =  f.read()
    if isinstance(fileContents, str):
        fileContents = fileContents.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.
    elif isinstance(fileContents, unicode):
        fileContents = fileContents.encode('ascii', 'ignore')

如何解碼從文件讀取的字符串？

問題描述

3 個解決方案

解決方案1
2 2019-01-09 01:10:23

解決方案2
2 已采納 2019-01-09 01:17:59

解決方案3
0 2019-01-09 02:03:33

如何解碼從文件讀取的字符串？

問題描述

3 個解決方案

解決方案1 2 2019-01-09 01:10:23

解決方案2 2 已采納 2019-01-09 01:17:59

解決方案3 0 2019-01-09 02:03:33

解決方案1
2 2019-01-09 01:10:23

解決方案2
2 已采納 2019-01-09 01:17:59

解決方案3
0 2019-01-09 02:03:33