简体   繁体   English

Python read()适用于UTF-8,但readlines()“不”

[英]Python read() works with UTF-8 but readlines() “doesn't”

So, I am working with a (huge) UTF-8 encoded file. 因此,我正在使用(巨大)UTF-8编码的文件。 The first thing I do with it it's get it's lines in a list using the File Object readlines() method. 我要做的第一件事是使用File Object readlines()方法获得列表中的行。 However when I use the print command for debugging I get things like, for example, \\xc3 etc. 但是,当我使用print命令进行调试时,会得到\\ xc3等信息。

Here's a really small example that replicates my problem; 这是一个非常小的例子,它重复了我的问题; I created a t.txt file that contains only the text "Clara Martínez" 我创建了一个t.txt文件,其中仅包含文本“ ClaraMartínez”

f = open("t.txt", "r")
s = f.read()
print s
Clara Martínez
#If I do the following however
lines = f.readlines()
for l in lines:
    print l
['Clara Mart\xc3\xadnez']
#write however works fine!
f2 = open("t2.txt", "w")
for l in lines:
    f2.write(l)
f2.close()
f1.close()

And then I open the "t2.txt", the string is correct, ie: Clara Martínez. 然后我打开“ t2.txt”,该字符串是正确的,即:ClaraMartínez。 Is there any way to "make" readlines() work as read()? 有什么办法可以使readlines()像read()一样工作?

You claim that this: 您声称这:

lines = f.readlines()
for l in lines:
    print l

Will result in this: 将导致以下结果:

['Clara Mart\xc3\xadnez']

This is not true, it will not. 这是不正确的,事实并非如此。 I think you made a mistake in your code, and wrote this: 我认为您在代码中犯了一个错误,并这样写:

lines = f.readlines()
for l in lines:
    print lines

That code will give the result you say, assuming the file contains only one line with the text 'Clara Mart\\xc3\\xadnez' . 假设文件仅包含一行带有文本'Clara Mart\\xc3\\xadnez'代码,该代码将提供您所说的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM