Python read（）适用于UTF-8，但readlines（）“不”

Question

So, I am working with a (huge) UTF-8 encoded file. 因此，我正在使用（巨大）UTF-8编码的文件。 The first thing I do with it it's get it's lines in a list using the File Object readlines() method. 我要做的第一件事是使用File Object readlines（）方法获得列表中的行。 However when I use the print command for debugging I get things like, for example, \\xc3 etc. 但是，当我使用print命令进行调试时，会得到\\ xc3等信息。

Here's a really small example that replicates my problem; 这是一个非常小的例子，它重复了我的问题； I created a t.txt file that contains only the text "Clara Martínez" 我创建了一个t.txt文件，其中仅包含文本“ ClaraMartínez”

f = open("t.txt", "r")
s = f.read()
print s
Clara Martínez
#If I do the following however
lines = f.readlines()
for l in lines:
    print l
['Clara Mart\xc3\xadnez']
#write however works fine!
f2 = open("t2.txt", "w")
for l in lines:
    f2.write(l)
f2.close()
f1.close()

And then I open the "t2.txt", the string is correct, ie: Clara Martínez. 然后我打开“ t2.txt”，该字符串是正确的，即：ClaraMartínez。 Is there any way to "make" readlines() work as read()? 有什么办法可以使readlines（）像read（）一样工作？

Answer 1

You claim that this: 您声称这：

lines = f.readlines()
for l in lines:
    print l

Will result in this: 将导致以下结果：

['Clara Mart\xc3\xadnez']

This is not true, it will not. 这是不正确的，事实并非如此。 I think you made a mistake in your code, and wrote this: 我认为您在代码中犯了一个错误，并这样写：

lines = f.readlines()
for l in lines:
    print lines

That code will give the result you say, assuming the file contains only one line with the text 'Clara Mart\\xc3\\xadnez' . 假设文件仅包含一行带有文本'Clara Mart\\xc3\\xadnez'代码，该代码将提供您所说的结果。

Python read（）适用于UTF-8，但readlines（）“不”

问题描述

1 个解决方案

解决方案1
5 2013-09-03 07:08:33

Python read（）适用于UTF-8，但readlines（）“不”

问题描述

1 个解决方案

解决方案1 5 2013-09-03 07:08:33

解决方案1
5 2013-09-03 07:08:33