基於另一個列表中的值從列表中篩選出值的最有效方法是什么

Question

我目前創建了這樣的列表：

stopfile = os.path.join(baseDir, inputPath, STOPWORDS_PATH)
stopwords = set(sc.textFile(stopfile).collect())
print 'These are the stopwords: %s' % stopwords

def tokenize(string):
    """ An implementation of input string tokenization that excludes stopwords
    Args:
        string (str): input string
    Returns:
        list: a list of tokens without stopwords
    """
    res = list()
    for word in simpleTokenize(string):
        if word not in stopwords:
            res.append(word)
    return res

simpleTokenize只是字符串的基本拆分函數，它返回字符串列表。

Answer 1

這可以。 如果您想以更“ Pythonic”的方式（一行代碼而不是4行）來實現，則可以使用列表推導：

res = [word for word in simpleTokenize(string) if word not in stopwords]

Answer 2

您已經使用了最大加速潛力的set （基於我希望您的代碼list.__contains__測試的問題標題）。 我唯一可以建議的就是使函數成為生成器，因此您不需要創建res列表：

def tokenize(text):
    for word in simpleTokenize(string):
        if word not in stopwords:
            yield word

Answer 3

您可以使用過濾器功能

stopfile = os.path.join(baseDir, inputPath, STOPWORDS_PATH)
stopwords = set(sc.textFile(stopfile).collect())
print 'These are the stopwords: %s' % stopwords

def tokenize(string):
    """ An implementation of input string tokenization that excludes stopwords
    Args:
        string (str): input string
    Returns:
        list: a list of tokens without stopwords
    """
    return filter(lambda x:x not in stopwords, simpleTokenize(string))

基於另一個列表中的值從列表中篩選出值的最有效方法是什么

問題描述

3 個解決方案

解決方案1
3 已采納 2015-06-24 03:44:19

解決方案2
2 2015-06-24 03:45:55

解決方案3
0 2015-06-24 04:08:52

基於另一個列表中的值從列表中篩選出值的最有效方法是什么

問題描述

3 個解決方案

解決方案1 3 已采納 2015-06-24 03:44:19

解決方案2 2 2015-06-24 03:45:55

解決方案3 0 2015-06-24 04:08:52

解決方案1
3 已采納 2015-06-24 03:44:19

解決方案2
2 2015-06-24 03:45:55

解決方案3
0 2015-06-24 04:08:52