[英]Open() and codecs.open() in Python 2.7 behave strangely different
I have a text file with first line of unicode characters and all other lines in ASCII. 我有一个带有unicode字符的第一行和所有其他ASCII行的文本文件。 I try to read the first line as one variable, and all other lines as another. 我尝试将第一行读取为一个变量,将所有其他行读取为另一个变量。 However, when I use the following code: 但是,当我使用以下代码时:
# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
I get the following output: 我得到以下输出:
<open file '1.txt', mode 'rb' at 0x01235230>
28
7
And now for something completely differerent:
<open file '1.txt', mode 'r' at 0x017875A0>
28
77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open(). 如果我不使用readlines(),则整个文件都将读取,不仅是codecs.open()和open()的前7行。
Why does such thing happen? 为什么会这样呢? And why does codecs.open() read file in binary mode, despite the 'r' parameter is added? 为什么尽管添加了“ r”参数,但codecs.open()仍以二进制模式读取文件?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687 更新:这是原始文件: http : //www1.datafilehost.com/d/0792d687
Because you used .readline()
first , the codecs.open()
file has filled a linebuffer; 因为使用.readline()
第一 , codecs.open()
文件已填补了linebuffer; the subsequent call to .readlines()
returns only the buffered lines. 随后对.readlines()
调用仅返回缓冲的行。
If you call .readlines()
again , the rest of the lines are returned: 如果再次调用.readlines()
,则返回其余行:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline()
and .readlines()
: 解决方法是不要混合使用.readline()
和.readlines()
:
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; 这种行为确实是一个错误。 the Python devs are aware of it, see issue 8260 . Python开发人员已意识到这一点,请参阅问题8260 。
The other option is to use io.open()
instead of codecs.open()
; 另一种选择是使用io.open()
而不是codecs.open()
; the io
library is what Python 3 uses to implement the built-in open()
function and is a lot more robust and versatile than the codecs
module. io
库是Python 3用于实现内置open()
函数的库,并且比codecs
模块更强大,更通用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.