简体   繁体   中英

Remove all of specific words in a list

I have a list like this ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization'] . I want to remove all of the words: and , or , of . I, therefore, come up with the following block of code

my_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
print('Before: {}'.format(my_list))
my_list = list(filter(lambda a: 'and' not in a and 'of' not in a and 'or' not in a, my_list))
print('After: {}'.format(my_list))

However, my code gives the output like this

Before: ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
After: []

What I want should be

['land_transport', 'port', 'surveyor', 'organization']

There are, of course, several ways to go around. But I want to insist on using lambda function to solve this problem. Any suggestions for my problem?

You can create a new list storing all of the words to be filtered:

my_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
to_remove = ['or', 'of', 'and']
new_list = list(filter(lambda x:x not in to_remove, my_list))

Output:

['land_transport', 'port', 'surveyor', 'organization']

Your filtering is not correct use:

filter_set = {'and', 'or', 'of'}
my_list = list(filter(lambda a: a not in filter_set, my_list))

You want all the items in my_list that are not in the filter_set , notice the use of a set , it will make the lookup much faster (O(N) vs O(1)) .

Although above answers serve the need, I think you intend to remove stop words.

nltk is best resource in Python for that. You can use nltk.corpus.stopwords

You dont have to do much manipulation if you know you are removing the actual English stop words.

from nltk.corpus import stopwords
word_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
filtered_words = [word for word in word_list if word not in stopwords.words('english')]

print(filtered_words)

['land_transport', 'port', 'surveyor', 'organization']

Vola

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM