简体   繁体   English

在Python中遍历文件对象不起作用,但是readlines()可以,但是效率低下

[英]iterating over file object in Python does not work, but readlines() does but is inefficient

In the following code, if I use: 在以下代码中,如果我使用:

for line in fin:

It only executes for 'a' 它仅针对“ a”执行

But if I use: 但是,如果我使用:

wordlist = fin.readlines()
for line in wordlist:

Then it executes for a thru z. 然后执行一遍z。

But readlines() reads the whole file at once, which I don't want. 但是readlines()一次读取整个文件,我不希望这样。

How to avoid this? 如何避免这种情况?

def avoids():
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    num_words = {}

    fin = open('words.txt')

    for char in alphabet:
      num_words[char] = 0
      for line in fin:
        not_found = True
        word = line.strip()
        if word.lower().find(char.lower()) != -1:
          num_words[char] += 1
    fin.close()
    return num_words

the syntax for line in fin can only be used once. for line in fin的语法只能使用一次。 After you do that, you've exhausted the file and you can't read it again unless you "reset the file pointer" by fin.seek(0) . 完成此操作后,您已经用尽了文件,除非您通过fin.seek(0) “重置文件指针”,否则无法再次读取文件。 Conversely, fin.readlines() will give you a list which you can iterate over and over again. 相反, fin.readlines()将为您提供一个列表,您可以反复遍历。


I think a simple refactor with Counter (python2.7+) could save you this headache: 我认为使用Counter (python2.7 +)进行简单的重构可以为您省去麻烦:

from collections import Counter
with open('file') as fin:
    result = Counter()
    for line in fin:
        result += Counter(set(line.strip().lower()))

which will count the number of words in your file (1 word per line) that contain a particular character (which is what your original code does I believe ... Please correct me if I'm wrong) 它将计算文件中包含特定字符的单词数(每行1个单词)(我相信这是您的原始代码...如果我错了,请更正我)

You could also do this easily with a defaultdict (python2.5+): 您也可以使用defaultdict (python2.5 +)轻松完成此操作:

from collections import defaultdict
with open('file') as fin:
    result = defaultdict(int)
    for line in fin:
        chars = set(line.strip().lower())
        for c in chars:
            result[c] += 1

And finally, kicking it old-school -- I don't even know when setdefault was introduced...: 最后,把它踢得很老套-我什至不知道什么时候引入了setdefault ...:

fin = open('file')
result = dict()
for line in fin:
    chars = set(line.strip().lower())
    for c in chars:
        result[c] = result.setdefault(c,0) + 1

fin.close()

You have three options: 您有三种选择:

  1. Read in the whole file anyway. 无论如何都读取整个文件。
  2. Seek back to the beginning of the file before attempting to iterate over it again. 尝试再次遍历文件之前,请先回到文件的开头。
  3. Rearchitect your code so that it doesn't need to iterate over the file more than once. 重新设计代码,以使它不需要多次遍历文件。

Try: 尝试:

from collections import defaultdict
from itertools import product

def avoids():
    alphabet = 'abcdefghijklmnopqrstuvwxyz'

    num_words = defaultdict(int)

    with open('words.txt') as fin:
        words = [x.strip() for x in fin.readlines() if x.strip()]

    for ch, word in product(alphabet, words):
        if ch not in word:
             continue
        num_words[ch] += 1

    return num_words

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM