文本文件中的單詞列表

Question

我需要從文本文件創建單詞列表。 該列表將在子手代碼中使用，並且需要從列表中排除以下內容：

重復詞
少於5個字母的單詞
包含“ xx”作為子字符串的詞
包含大寫字母的單詞

然后需要將單詞列表輸出到文件中，以便每個單詞都顯示在自己的行上。 該程序還需要輸出最終列表中的單詞數。

這是我所擁有的，但工作不正常。

def MakeWordList():
    infile=open(('possible.rtf'),'r')
    whole = infile.readlines()
    infile.close()

    L=[]
    for line in whole:
        word= line.split(' ')
        if word not in L:
            L.append(word)
            if len(word) in range(5,100):
                L.append(word)
                if not word.endswith('xx'):
                    L.append(word)
                    if word == word.lower():
                        L.append(word)
    print L

MakeWordList()

Answer 1

您會用此代碼多次添加單詞，
您arn't實際篩選出來的話可言，只是將它們添加取決於有多少不同數量的定時if的他們通過。

您應該結合所有if ：

if word not in L and len(word) >= 5 and not 'xx' in word and word.islower():
    L.append(word)

或者，如果您希望它更具可讀性，可以將它們拆分：

    if word not in L and len(word) >= 5:
        if not 'xx' in word and word.islower():
            L.append(word)

但是不要在每一個之后追加。

Answer 2

考慮一下：在嵌套的if語句中，列表中尚未存在的任何單詞都將使您在第一行中通過。 然后，如果它是5個或更多字符，它將被再次添加（我敢打賭），並再次添加，依此類推。您需要在if語句中重新考慮邏輯。

Answer 3

改進的代碼：

def MakeWordList():
    with open('possible.rtf','r') as f:
        data = f.read()
    return set([word for word in data if len(word) >= 5 and word.islower() and not 'xx' in word])

set(_iterable_)返回一個沒有重復的set-type對象（所有set項必須是唯一的）。 [word for word...]是列表理解，它是創建簡單列表的一種較短方法。 您可以遍歷“數據”中的每個單詞（假定每個單詞在單獨的行上）。 if len(word) >= 5 and word.islower() and not 'xx' in word滿足最后三個要求（必須超過5個字母，只能是小寫字母，並且不能包含“ xx”）。

文本文件中的單詞列表

問題描述

3 個解決方案

解決方案1
2 2013-04-09 01:10:52

解決方案2
0 2013-04-09 01:16:02

解決方案3
0 2013-04-09 01:46:08

文本文件中的單詞列表

問題描述

3 個解決方案

解決方案1 2 2013-04-09 01:10:52

解決方案2 0 2013-04-09 01:16:02

解決方案3 0 2013-04-09 01:46:08

解決方案1
2 2013-04-09 01:10:52

解決方案2
0 2013-04-09 01:16:02

解決方案3
0 2013-04-09 01:46:08