I am working with NLTK and I would like to find all sentences that include a given set of key words. For example, it is currently [x for x in tokenized_sent if 'key_word1' and 'key_word2' and 'key_word3' in x]
. I would like to set it up so that a user can input any number of words that can then be set equal to these key words separated by and
.
I have tried something like inserting user_input_list = ['key_word1','key_word2']
by writing [x for x in tokenized_sent if user_input_list[0] and user_input_list[1] in x]
which works but there has got to be a better way, especially a way to handle any given number of words to look for. Thanks.
You can utilize set subsets . Make the user input list a set and see if it is a subset of your key words.
[x for x in tokenized_sent if set(user_input_list).issubset(x)]
I think that you could use the all -Keyword:
[words for words in tokenized_sent if all([keyword in words for keyword in keywords])]
Spend attention: The first in results in -Statement a boolean while the second one is used to get elements from the list.
Use filter
and all
methods:
list(filter(lambda x: all(key in x for key in user_input_list), tokenized_sent))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.