简体   繁体   English

如何在Python中迭代空格分隔的ASCII文件

[英]How to iterate over space-separated ASCII file in Python

Strange question here. 奇怪的问题在这里。

I have a .txt file that I want to iterate over. 我有一个我想迭代的.txt文件。 I can get all the words into an array from the file, which is good, but what I want to know how to do is, how do I iterate over the whole file, but not the individual letters, but the words themselves. 我可以从文件中将所有单词都放到一个数组中,这很好,但我想知道怎么做,我如何迭代整个文件,但不是单个字母,而是单词本身。

I want to be able to go through the array which houses all the text from the file, and basically count all the instances in which a word appears in it. 我希望能够通过包含文件中所有文本的数组,并基本上计算其中出现单词的所有实例。

Only problem is I don't know how to write the code for it. 唯一的问题是我不知道如何为它编写代码。

I tried using a for loop, but that just iterates over every single letter, when I want the whole words. 我尝试使用for循环,但只是迭代每一个字母,当我想要整个单词时。

This code reads the space separated file.txt 此代码读取空格分隔的file.txt

f = open("file.txt", "r")
words = f.read().split()
for w in words:
    print w
file = open("test")
for line in file:
    for word in line.split(" "):
         print word

Untested: 未经测试:

def produce_words(file_):
   for line in file_:
     for word in line.split():
        yield word

def main():
   with open('in.txt', 'r') as file_:
      for word in produce_words(file_):
         print word

If you want to loop over an entire file, then the sensible thing to do is to iterate over the it, taking the lines and splitting them into words. 如果你想循环遍历整个文件,那么明智的做法就是迭代它,取出行并将它们分成单词。 Working line-by-line is best as it means we don't read the entire file into memory first (which, for large files, could take a lot of time or cause us to run out of memory): 逐行工作是最好的,因为这意味着我们不会首先将整个文件读入内存(对于大文件,可能需要花费大量时间或导致内存耗尽):

with open('in.txt') as input:
    for line in input:
        for word in line.split():
            ...

Note that you could use line.split(" ") if you want to preserve more whitespace, as line.split() will remove all excess whitespace. 请注意,如果要保留更多空格,可以使用line.split(" ") ,因为line.split()将删除所有多余的空格。

Also note my use of the with statement to open the file, as it's more readable and handles closing the file, even on exceptions. 另请注意我使用with语句打开文件,因为它更易读并处理关闭文件,即使是异常。

While this is a good solution, if you are not doing anything within the first loop, it's also a little inefficient. 虽然这是一个很好的解决方案,但如果你在第一个循环中没有做任何事情,那么效率也会有点低。 To reduce this to one loop, we can use itertools.chain.from_iterable and a generator expression : 要将此减少为一个循环,我们可以使用itertools.chain.from_iterable生成器表达式

import itertools
with open('in.txt') as input:
    for word in itertools.chain.from_iterable(line.split() for line in input):
            ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM