I have to check whether a element from given list is in text or not,if it is a single word i can,but if it contains multiple words like below i am not able to get
text="what is the price of wheat and White Pepper?"
words=['wheat','White Pepper','rice','pepper']
Expected output=['wheat','White Pepper']
I tried in below ways ,but not getting expected output,can anyone help me?
>>> output=[word for word in words if word in text]
>>> print output
>>> ['rice', 'White Pepper', 'wheat']
here it is taking word "rice" from word "price"
If i use nltk or any it will split "White Pepper" into "White" and "pepper"
>>> from nltk import word_tokenize
>>> n_words=word_tokenize(text)
>>> print n_words
>>> ['what', 'is', 'the', 'price', 'of', 'wheat', 'and', 'White', 'Pepper', '?']
>>> output=[word for word in words if word in n_words]
>>> print output
>>> ['wheat']
you could use regular expressions and word boundaries:
import re
text="what is the price of wheat and White Pepper?"
words=['wheat','White Pepper','rice','pepper']
output=[word for word in words if re.search(r"\b{}\b".format(word),text)]
print(output)
result:
['wheat', 'White Pepper']
you can optimize the search by pre-building your regex (courtesy Jon Clements ):
output = re.findall(r'\b|\b'.join(sorted(words, key=len, reverse=True)), text)
The sort is necessary to make sure longest strings are taken first. Regex escaping is probably not necessary since the words contain only spaces and alphanums.
So I would do something like this.
def findWord(list, text):
words = []
for i in list:
index = text.find(i)
if index != -1:
if index != 0 and text[index - 1] != " ":
continue
words.append(i)
return words
The find function for string will return -1 if a string is not present. White Pepper returns 31 because that is the index where it starts.
This returns ['wheat', and 'White Pepper']
for the test case you provided.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.