从文本文件中读取非ASCII字符

Question

我正在使用python 2.7。 我尝试了许多像编解码器这样的东西，但没有用。 我怎样才能解决这个问题。

myfile.txt文件

wörd

我的代码

f = open('myfile.txt','r')
for line in f:
    print line
f.close()

产量

s\xc3\xb6zc\xc3\xbck

eclipse和命令窗口的输出相同。 我正在使用Win7。 当我不从文件中读取时，任何字符都没有问题。

Answer 1

import codecs
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8')
#read the file to unicode string
sfile=f.read()

#check the encoding type
print type(file) #it's unicode

#unicode should be encoded to standard string to display it properly
print sfile.encode('utf-8')
#check the type of encoded string

print type(sfile.encode('utf-8'))

Answer 2

首先 - 检测文件的编码


  from chardet import detect
  encoding = lambda x: detect(x)['encoding']
  print encoding(line)

然后 - 将其转换为unicode或您的默认编码str：


  n_line=unicode(line,encoding(line),errors='ignore')
  print n_line
  print n_line.encode('utf8')

Answer 3

这是终端编码。 尝试使用您在文件中使用的相同编码配置终端。 我建议你使用UTF-8。

顺便说一句，对所有输入 - 输出进行解码编码是一种很好的做法，以避免出现问题：

f = open('test.txt','r')    
for line in f:
    l = unicode(line, encoding='utf-8')# decode the input                                                                                  
    print l.encode('utf-8') # encode the output                                                                                            
f.close()

从文本文件中读取非ASCII字符

问题描述

3 个解决方案

解决方案1
12 2013-02-09 11:58:32

解决方案2
7 已采纳 2012-04-30 00:16:51

解决方案3
1 2012-04-30 00:18:12

从文本文件中读取非ASCII字符

问题描述

3 个解决方案

解决方案1 12 2013-02-09 11:58:32

解决方案2 7 已采纳 2012-04-30 00:16:51

解决方案3 1 2012-04-30 00:18:12

解决方案1
12 2013-02-09 11:58:32

解决方案2
7 已采纳 2012-04-30 00:16:51

解决方案3
1 2012-04-30 00:18:12