在Python中迭代文件的單詞

Question

我需要遍歷一個大文件的單詞，該文件由一個長的長行組成。 我知道逐行迭代文件的方法，但由於它的單行結構，它們在我的情況下不適用。

任何替代品？

Answer 1

這實際上取決於你對單詞的定義。 但試試這個：

f = file("your-filename-here").read()
for word in f.split():
    # do something with word
    print word

這將使用空白字符作為單詞邊界。

當然，記得要正確打開和關閉文件，這只是一個簡單的例子。

Answer 2

長線？ 我認為這條線太大而不能合理地放在內存中，所以你需要某種緩沖。

首先，這是一個糟糕的格式; 如果您對文件有任何控制權，請每行一個字。

如果沒有，請使用以下內容：

line = ''
while True:
    word, space, line = line.partition(' ')
    if space:
        # A word was found
        yield word
    else:
        # A word was not found; read a chunk of data from file
        next_chunk = input_file.read(1000)
        if next_chunk:
            # Add the chunk to our line
            line = word + next_chunk
        else:
            # No more data; yield the last word and return
            yield word.rstrip('\n')
            return

Answer 3

你真的應該考慮使用Generator

def word_gen(file):
    for line in file:
        for word in line.split():
            yield word

with open('somefile') as f:
    word_gen(f)

Answer 4

有更有效的方法來做到這一點，但從語法上講，這可能是最短的：

 words = open('myfile').read().split()

如果內存是一個問題，你不會想要這樣做，因為它會將整個內容加載到內存中，而不是迭代它。

Answer 5

唐納德·米納建議看起來很好。 簡單而簡短。 我在前面編寫的代碼中使用了以下代碼：

l = []
f = open("filename.txt", "rU")
for line in f:
    for word in line.split()
        l.append(word)

唐納德米納建議的更長版本。

Answer 6

我已經回答過類似的問題之前，但人有我精在這個問題的答案所使用的方法，這里是更新版本（從最近抄答案）：

這是我完全功能性的方法，避免了必須讀取和分割線條。 它使用了itertools模塊：

注意python 3，用`map`替換`itertools.imap`

 import itertools def readwords(mfile): byte_stream = itertools.groupby( itertools.takewhile(lambda c: bool(c), itertools.imap(mfile.read, itertools.repeat(1))), str.isspace) return ("".join(group) for pred, group in byte_stream if not pred)

樣品用法：

 >>> import sys >>> for w in readwords(sys.stdin): ... print (w) ... I really love this new method of reading words in python I really love this new method of reading words in python It's soo very Functional! It's soo very Functional! >>>

我想在你的情況下，這將是使用該功能的方式：

 with open('words.txt', 'r') as f: for word in readwords(f): print(word)

Answer 7

正常讀入該行，然后將其拆分為空格以將其分解為單詞？

就像是：

word_list = loaded_string.split()

Answer 8

閱讀完行后你可以這樣做：

l = len(pattern)
i = 0
while True:
    i = str.find(pattern, i)
    if i == -1:
        break
    print str[i:i+l] # or do whatever
    i += l

亞歷克斯。

在Python中迭代文件的單詞

問題描述

8 個解決方案

解決方案1
7 2011-10-12 19:16:19

解決方案2
5 2011-10-12 19:25:22

解決方案3
3 2014-02-12 03:17:43

解決方案4
2 2011-10-12 19:16:02

解決方案5
0 2015-11-08 07:59:06

解決方案6
0 2016-11-30 01:26:48

注意python 3，用`map`替換`itertools.imap`

解決方案7
0 2011-10-12 19:15:39

解決方案8
0 2011-10-12 19:23:37

在Python中迭代文件的單詞

問題描述

8 個解決方案

解決方案1 7 2011-10-12 19:16:19

解決方案2 5 2011-10-12 19:25:22

解決方案3 3 2014-02-12 03:17:43

解決方案4 2 2011-10-12 19:16:02

解決方案5 0 2015-11-08 07:59:06

解決方案6 0 2016-11-30 01:26:48

注意python 3，用map替換itertools.imap

解決方案7 0 2011-10-12 19:15:39

解決方案8 0 2011-10-12 19:23:37

解決方案1
7 2011-10-12 19:16:19

解決方案2
5 2011-10-12 19:25:22

解決方案3
3 2014-02-12 03:17:43

解決方案4
2 2011-10-12 19:16:02

解決方案5
0 2015-11-08 07:59:06

解決方案6
0 2016-11-30 01:26:48

注意python 3，用`map`替換`itertools.imap`

解決方案7
0 2011-10-12 19:15:39

解決方案8
0 2011-10-12 19:23:37