我如何在線閱讀單詞？

Question

這就是我所做的。 問題將在結尾。

1）我首先使用open().read()打開一個.txt文檔，以運行如下功能：

def clean_text_passage(a_text_string):
    new_passage=[]
    p=[line+'\n' for line in a_text_string.split('\n')]
    passage = [w.lower().replace('</b>\n', '\n') for w in p]

    if len(passage[0].strip())>0:
       if len(passage[1].strip())>0:
           new_passage.append(passage[0])
    return new_passage

2）使用返回的new_passage ，我使用以下命令將單詞轉換為單詞行：

newone = "".join(new_passage)

3）然后，運行另一個功能，如下所示：

def replace(filename):
    match = re.sub(r'[^\s^\w+]risk', 'risk', filename)
    match2 = re.sub(r'risk[^\s^\-]+', 'risk', match)
    match3 = re.sub(r'risk\w+', 'risk', match2)
    return match3

到目前為止，一切都很好。 現在這是問題所在。 當我打印match3 ：

i agree to the following terms regarding my employment or continued employment
with dell computer corporation or a subsidiary or affiliate of dell computer
corporation (collectively, "dell").

看起來單詞成行。 但，

4）我通過convert = count_words(match3)來運行最后一個函數，如下所示：

def count_words(newstring):
     from collections import defaultdict
     word_dict=defaultdict(int)
     for line in newstring:
    words=line.lower().split()
    for word in words:
        word_dict[word]+=1

當我打印word_dict ，它顯示如下：

defaultdict(<type 'int'>, {'"': 2, "'": 1, '&': 4, ')': 3, '(': 3, '-': 4, ',': 4, '.': 9, '1': 7, '0': 8, '3': 2, '2': 3, '5': 2, '4': 2, '7': 2, '9': 2, '8': 1, ';': 4, ':': 2, 'a': 67, 'c': 34, 'b': 18, 'e': 114, 'd': 44, 'g': 15, 'f': 23, 'i': 71, 'h': 22, 'k': 10, 'j': 2, 'm': 31, 'l': 43, 'o': 79, 'n': 69, 'p': 27, 's': 56, 'r': 72, 'u': 19, 't': 81, 'w': 4, 'v': 3, 'y': 16, 'x': 3})

因為我的代碼的目的是計算一個特定的單詞，所以我需要像“風險”這樣的單詞排成一行（即，我喜歡冒險），而不是“ I”，“ l”，“ i”

問題：如何使match3包含單詞的方式與使用readlines()所獲得的方式相同，以便我可以計算一行中的單詞？

當我將match3保存為.txt文件時，使用readlines()重新打開它，然后運行count函數，它可以正常工作。 我確實想知道如何在不使用readlines()保存並重新打開的情況下使其工作？

謝謝。 我希望我能弄清楚這一點，以便我入睡。

Answer 1

嘗試這個

for line in newstring意味着將一個字符逐個迭代

def count_words(newstring):
     from collections import defaultdict
     word_dict=defaultdict(int)
     for line in newstring.split('\n'):
         words=line.lower().split()
         for word in words:
            word_dict[word]+=1

Answer 2

tl; dr，問題是如何按行分割文本？

然后這很簡單：

>>> text = '''This is a
longer text going
over multiple lines
until the string
ends.'''
>>> text.split('\n')
['This is a', 'longer text going', 'over multiple lines', 'until the string', 'ends.']

Answer 3

您的match3是一個字符串，所以

for line in newstring:

遍歷newstring中的字符，而不是行。 你可以簡單地寫

 words = newstring.lower().split()
 for word in words:
     word_dict[word]+=1

或者如果您願意

 for line in newstring.splitlines():
     words=line.lower().split()
     for word in words:
         word_dict[word]+=1

管他呢。 [我會自己使用Counter ，但是defaultdict(int)幾乎一樣好。]

注意：

def replace(filename):

filename不是文件名！

我如何在線閱讀單詞？

問題描述

3 個解決方案

解決方案1
0 2012-09-02 15:40:37

解決方案2
0 2012-09-02 15:40:56

解決方案3
0 2012-09-02 15:41:24

我如何在線閱讀單詞？

問題描述

3 個解決方案

解決方案1 0 2012-09-02 15:40:37

解決方案2 0 2012-09-02 15:40:56

解決方案3 0 2012-09-02 15:41:24

解決方案1
0 2012-09-02 15:40:37

解決方案2
0 2012-09-02 15:40:56

解決方案3
0 2012-09-02 15:41:24