计算出现在word文件中的后缀

Question

I have got this python program which reads through a wordlist file and checks for the suffixes ending which are given in another file using endswith() method. 我有这个python程序，该程序会读取单词表文件并检查使用endswith（）方法在另一个文件中给出的后缀结尾。 the suffixes to check for is saved into the list: suffixList[] The count is being taken using suffixCount[] 将要检查的后缀保存到列表中：suffixList []使用suffixCount []进行计数

The following is my code: 以下是我的代码：

fd = open(filename, 'r')
print 'Suffixes: '
x = len(suffixList)
for line in fd:
   for wordp in range(0,x):
        if word.endswith(suffixList[wordp]):
           suffixCount[wordp] = suffixCount[wordp]+1
for output in range(0,x):
     print  "%-6s %10i"%(prefixList[output], prefixCount[output])

fd.close()

The output is this : 输出是这样的：

Suffixes: 
able            0
ible            0
ation           0

the program is unable to reach this loop : 程序无法到达此循环：

if word.endswith(suffixList[wordp]):

Answer 1

You need to strip the newline: 您需要删除换行符：

word = ln.rstrip().lower()

The words are coming from a file so each line ends with a newline character. 这些单词来自文件，因此每一行都以换行符结尾。 You are then trying to use endswith which always fails as none of your suffixes end with a newline. 然后，您尝试使用endswith总是失败，因为没有一个后缀以换行符结尾。

I would also change the function to return the values you want: 我还将更改函数以返回所需的值：

def store_roots(start, end):
    with open("rootsPrefixesSuffixes.txt") as fs:
        lst = [line.split()[0] for line in map(str.strip, fs)
                       if '#' not in line and line]
        return lst, dict.fromkeys(lst[start:end], 0)

lst, sfx_dict = store_roots(22, 30) # List, SuffixList

Then slice from the end and see if the substring is in the dict: 然后从末尾切片，看看子串是否在字典中：

with open('longWordList.txt') as fd:
    print('Suffixes: ')
    mx, mn = max(sfx_dict, key=len), min(sfx_dict, key=len)
    for ln in map(str.rstrip, fd):
        suf = ln[-mx:]
        for i in range(mx-1, mn-1, -1):
            if suf in sfx_dict:
                sfx_dict[suf] += 1
            suf = suf[-i:]
    for k,v in sfx_dict:
        print("Suffix = {} Count =  {}".format(k,v))

Slicing the end of the string incrementally should be faster than checking every string especially if you have numerous suffixes that are the same length. 与检查每个字符串相比，递增地对字符串的末尾进行切片应该比检查每个字符串更快，特别是如果您有多个长度相同的后缀。 At most it does mx - mn iterations, so if you had 20 four character suffixes you would only need to check the dict once, only one n length substring can be matched at a time so we would kill n length substrings at the one time with a single slice and lookup. 它最多执行mx - mn次迭代，因此，如果您有20个四个字符后缀，则只需检查一次dict，一次只能匹配一个n长度的子字符串，因此我们可以一次杀死n长度的子字符串，单个切片和查找。

Answer 2

You could use a Counter to count the occurrences of suffix: 您可以使用Counter来计数后缀的出现：

from collections import Counter

with open("rootsPrefixesSuffixes.txt") as fp:
    List = [line.strip() for line in fp if line and '#' not in line]
suffixes = List[22:30]  # ?

with open('longWordList.txt') as fp:
    c = Counter(s for word in fp for s in suffixes if word.rstrip().lower().endswith(s))
print(c)

Note: add .split()[0] if there are more than one words per line you want to ignore, otherwise this is unnecessary. 注意：如果要忽略的每行有多个单词，请添加.split()[0] ，否则这是不必要的。

计算出现在word文件中的后缀

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-10-13 17:00:35

解决方案2
0 2015-10-13 17:19:32

计算出现在word文件中的后缀

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-10-13 17:00:35

解决方案2 0 2015-10-13 17:19:32

解决方案1
1 已采纳 2015-10-13 17:00:35

解决方案2
0 2015-10-13 17:19:32