I have a list like this ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
. I want to remove all of the words: and
, or
, of
. I, therefore, come up with the following block of code
my_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
print('Before: {}'.format(my_list))
my_list = list(filter(lambda a: 'and' not in a and 'of' not in a and 'or' not in a, my_list))
print('After: {}'.format(my_list))
However, my code gives the output like this
Before: ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
After: []
What I want should be
['land_transport', 'port', 'surveyor', 'organization']
There are, of course, several ways to go around. But I want to insist on using lambda function to solve this problem. Any suggestions for my problem?
You can create a new list storing all of the words to be filtered:
my_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
to_remove = ['or', 'of', 'and']
new_list = list(filter(lambda x:x not in to_remove, my_list))
Output:
['land_transport', 'port', 'surveyor', 'organization']
Your filtering is not correct use:
filter_set = {'and', 'or', 'of'}
my_list = list(filter(lambda a: a not in filter_set, my_list))
You want all the items in my_list
that are not in the filter_set
, notice the use of a set
, it will make the lookup much faster (O(N) vs O(1))
.
Although above answers serve the need, I think you intend to remove stop words.
nltk
is best resource in Python for that. You can use nltk.corpus.stopwords
You dont have to do much manipulation if you know you are removing the actual English stop words.
from nltk.corpus import stopwords
word_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
filtered_words = [word for word in word_list if word not in stopwords.words('english')]
print(filtered_words)
['land_transport', 'port', 'surveyor', 'organization']
Vola
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.