Stemmer function 接受一个字符串并返回列表中每个单词的词干

Question

I am trying to create this function which takes a string as input and returns a list containing the stem of each word in the string.我正在尝试创建这个 function ，它将一个字符串作为输入并返回一个包含字符串中每个单词的词干的列表。 The problem is, that using a nested for loop, the words in the string are appended multiple times in the list.问题是，使用嵌套的 for 循环，字符串中的单词会在列表中多次附加。 Is there a way to avoid this?有没有办法避免这种情况？

def stemmer(text):
    
    stemmed_string = []
    res = text.split()
    suffixes = ('ed', 'ly', 'ing')
    
    for word in res:
            for i in range(len(suffixes)):
                if word.endswith(suffixes[i]):
                    stemmed_string.append(word[:-len(suffixes[i])])
                elif len(word) > 8:
                    stemmed_string.append(word[:8])
                else:
                    stemmed_string.append(word)
    
    return stemmed_string

If I call the function on this text ('I have a dog is barking') this is the output:如果我在此文本上调用 function（“我有一只狗在吠叫”），这是 output：

['I',
 'I',
 'I',
 'have',
 'have',
 'have',
 'a',
 'a',
 'a',
 'dog',
 'dog',
 'dog',
 'that',
 'that',
 'that',
 'is',
 'is',
 'is',
 'barking',
 'barking',
 'bark']

Answer 1

You are appending something in each round of the loop over suffixes.您在后缀的每一轮循环中附加一些内容。 To avoid the problem, don't do that.为避免此问题，请不要这样做。

It's not clear if you want to add the shortest possible string out of a set of candidates, or how to handle stacked suffixes.目前尚不清楚您是否想从一组候选字符串中添加最短的字符串，或者如何处理堆叠的后缀。 Here's a version which always strips as much as possible.这是一个总是尽可能多地剥离的版本。

def stemmer(text):
    stemmed_string = []
    suffixes = ('ed', 'ly', 'ing')
    
    for word in text.split():
        for suffix in suffixes:
            if word.endswith(suffix):
                word = word[:-len(suffix)]
        stemmed_string.append(word)
    
    return stemmed_string

Notice the fixed syntax for looping over a list, too.请注意循环列表的固定语法。

This will reduce "sparingly" to "spar", etc. Like every naïve stemmer, this will also do stupid things with words like "sly" and "thing".这会将“sparingly”减少为“spar”等。就像每个幼稚的词干分析器一样，这也会用“sly”和“thing”之类的词做一些愚蠢的事情。

Demo: https://ideone.com/a7FqBp演示： https://ideone.com/a7FqBp

Stemmer function 接受一个字符串并返回列表中每个单词的词干

问题描述

1 个解决方案

解决方案1
0 2022-09-23 16:11:06

Stemmer function 接受一个字符串并返回列表中每个单词的词干

问题描述

1 个解决方案

解决方案1 0 2022-09-23 16:11:06

解决方案1
0 2022-09-23 16:11:06