計算出現在word文件中的后綴

Question

我有這個python程序，該程序會讀取單詞表文件並檢查使用endswith（）方法在另一個文件中給出的后綴結尾。 將要檢查的后綴保存到列表中：suffixList []使用suffixCount []進行計數

以下是我的代碼：

fd = open(filename, 'r')
print 'Suffixes: '
x = len(suffixList)
for line in fd:
   for wordp in range(0,x):
        if word.endswith(suffixList[wordp]):
           suffixCount[wordp] = suffixCount[wordp]+1
for output in range(0,x):
     print  "%-6s %10i"%(prefixList[output], prefixCount[output])

fd.close()

輸出是這樣的：

Suffixes: 
able            0
ible            0
ation           0

程序無法到達此循環：

if word.endswith(suffixList[wordp]):

Answer 1

您需要刪除換行符：

word = ln.rstrip().lower()

這些單詞來自文件，因此每一行都以換行符結尾。 然后，您嘗試使用endswith總是失敗，因為沒有一個后綴以換行符結尾。

我還將更改函數以返回所需的值：

def store_roots(start, end):
    with open("rootsPrefixesSuffixes.txt") as fs:
        lst = [line.split()[0] for line in map(str.strip, fs)
                       if '#' not in line and line]
        return lst, dict.fromkeys(lst[start:end], 0)

lst, sfx_dict = store_roots(22, 30) # List, SuffixList

然后從末尾切片，看看子串是否在字典中：

with open('longWordList.txt') as fd:
    print('Suffixes: ')
    mx, mn = max(sfx_dict, key=len), min(sfx_dict, key=len)
    for ln in map(str.rstrip, fd):
        suf = ln[-mx:]
        for i in range(mx-1, mn-1, -1):
            if suf in sfx_dict:
                sfx_dict[suf] += 1
            suf = suf[-i:]
    for k,v in sfx_dict:
        print("Suffix = {} Count =  {}".format(k,v))

與檢查每個字符串相比，遞增地對字符串的末尾進行切片應該比檢查每個字符串更快，特別是如果您有多個長度相同的后綴。 它最多執行mx - mn次迭代，因此，如果您有20個四個字符后綴，則只需檢查一次dict，一次只能匹配一個n長度的子字符串，因此我們可以一次殺死n長度的子字符串，單個切片和查找。

Answer 2

您可以使用Counter來計數后綴的出現：

from collections import Counter

with open("rootsPrefixesSuffixes.txt") as fp:
    List = [line.strip() for line in fp if line and '#' not in line]
suffixes = List[22:30]  # ?

with open('longWordList.txt') as fp:
    c = Counter(s for word in fp for s in suffixes if word.rstrip().lower().endswith(s))
print(c)

注意：如果要忽略的每行有多個單詞，請添加.split()[0] ，否則這是不必要的。

計算出現在word文件中的后綴

問題描述

2 個解決方案

解決方案1
1 已采納 2015-10-13 17:00:35

解決方案2
0 2015-10-13 17:19:32

計算出現在word文件中的后綴

問題描述

2 個解決方案

解決方案1 1 已采納 2015-10-13 17:00:35

解決方案2 0 2015-10-13 17:19:32

解決方案1
1 已采納 2015-10-13 17:00:35

解決方案2
0 2015-10-13 17:19:32