[英]Python read() works with UTF-8 but readlines() “doesn't”
So, I am working with a (huge) UTF-8 encoded file. 因此,我正在使用(巨大)UTF-8编码的文件。 The first thing I do with it it's get it's lines in a list using the File Object readlines() method.
我要做的第一件事是使用File Object readlines()方法获得列表中的行。 However when I use the print command for debugging I get things like, for example, \\xc3 etc.
但是,当我使用print命令进行调试时,会得到\\ xc3等信息。
Here's a really small example that replicates my problem; 这是一个非常小的例子,它重复了我的问题; I created a t.txt file that contains only the text "Clara Martínez"
我创建了一个t.txt文件,其中仅包含文本“ ClaraMartínez”
f = open("t.txt", "r")
s = f.read()
print s
Clara Martínez
#If I do the following however
lines = f.readlines()
for l in lines:
print l
['Clara Mart\xc3\xadnez']
#write however works fine!
f2 = open("t2.txt", "w")
for l in lines:
f2.write(l)
f2.close()
f1.close()
And then I open the "t2.txt", the string is correct, ie: Clara Martínez. 然后我打开“ t2.txt”,该字符串是正确的,即:ClaraMartínez。 Is there any way to "make" readlines() work as read()?
有什么办法可以使readlines()像read()一样工作?
You claim that this: 您声称这:
lines = f.readlines()
for l in lines:
print l
Will result in this: 将导致以下结果:
['Clara Mart\xc3\xadnez']
This is not true, it will not. 这是不正确的,事实并非如此。 I think you made a mistake in your code, and wrote this:
我认为您在代码中犯了一个错误,并这样写:
lines = f.readlines()
for l in lines:
print lines
That code will give the result you say, assuming the file contains only one line with the text 'Clara Mart\\xc3\\xadnez'
. 假设文件仅包含一行带有文本
'Clara Mart\\xc3\\xadnez'
代码,该代码将提供您所说的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.