Python 2.7中的Open（）和codecs.open（）行為異常不同

Question

我有一個帶有unicode字符的第一行和所有其他ASCII行的文本文件。 我嘗試將第一行讀取為一個變量，將所有其他行讀取為另一個變量。 但是，當我使用以下代碼時：

# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()

我得到以下輸出：

<open file '1.txt', mode 'rb' at 0x01235230>
28

7

And now for something completely differerent:

<open file '1.txt', mode 'r' at 0x017875A0>

28

77

如果我不使用readlines（），則整個文件都將讀取，不僅是codecs.open（）和open（）的前7行。

為什么會這樣呢？ 為什么盡管添加了“ r”參數，但codecs.open（）仍以二進制模式讀取文件？

更新：這是原始文件： http : //www1.datafilehost.com/d/0792d687

Answer 1

因為使用.readline() 第一， codecs.open()文件已填補了linebuffer; 隨后對.readlines()調用僅返回緩沖的行。

如果再次調用.readlines() ，則返回其余行：

>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

解決方法是不要混合使用.readline()和.readlines() ：

f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ')  # take the first line.

這種行為確實是一個錯誤。 Python開發人員已意識到這一點，請參閱問題8260 。

另一種選擇是使用io.open()而不是codecs.open() ; io庫是Python 3用於實現內置open()函數的庫，並且比codecs模塊更強大，更通用。

Python 2.7中的Open（）和codecs.open（）行為異常不同

問題描述

1 個解決方案

解決方案1
16 已采納 2013-04-22 19:24:36

Python 2.7中的Open（）和codecs.open（）行為異常不同

問題描述

1 個解決方案

解決方案1 16 已采納 2013-04-22 19:24:36

解決方案1
16 已采納 2013-04-22 19:24:36