What's the most efficient way to filter values out of a list based on the values in another list

Question

I currently created a list like this:

stopfile = os.path.join(baseDir, inputPath, STOPWORDS_PATH)
stopwords = set(sc.textFile(stopfile).collect())
print 'These are the stopwords: %s' % stopwords

def tokenize(string):
    """ An implementation of input string tokenization that excludes stopwords
    Args:
        string (str): input string
    Returns:
        list: a list of tokens without stopwords
    """
    res = list()
    for word in simpleTokenize(string):
        if word not in stopwords:
            res.append(word)
    return res

simpleTokenize is just a basic split function on the string which returns a list of strings.

Answer 1

This is fine. If you want to do it in a more "Pythonic" way (one line of code instead of 4) you could use a list comprehension:

res = [word for word in simpleTokenize(string) if word not in stopwords]

Answer 2

You already are using a set which is the biggest potential speedup (based on the question title I was expecting your code to have a list.__contains__ test). The only remaining thing I can suggest is making your function a generator, so you don't need to create the res list:

def tokenize(text):
    for word in simpleTokenize(string):
        if word not in stopwords:
            yield word

Answer 3

You can use filter function

stopfile = os.path.join(baseDir, inputPath, STOPWORDS_PATH)
stopwords = set(sc.textFile(stopfile).collect())
print 'These are the stopwords: %s' % stopwords

def tokenize(string):
    """ An implementation of input string tokenization that excludes stopwords
    Args:
        string (str): input string
    Returns:
        list: a list of tokens without stopwords
    """
    return filter(lambda x:x not in stopwords, simpleTokenize(string))

What's the most efficient way to filter values out of a list based on the values in another list

Question

3 answers

solution1
3 ACCPTED 2015-06-24 03:44:19

solution2
2 2015-06-24 03:45:55

solution3
0 2015-06-24 04:08:52

What's the most efficient way to filter values out of a list based on the values in another list

Question

3 answers

solution1 3 ACCPTED 2015-06-24 03:44:19

solution2 2 2015-06-24 03:45:55

solution3 0 2015-06-24 04:08:52

solution1
3 ACCPTED 2015-06-24 03:44:19

solution2
2 2015-06-24 03:45:55

solution3
0 2015-06-24 04:08:52