Let's say I have a list of sentences:
sent = ["Chocolate is loved by all.",
"Brazil is the biggest exporter of coffee.",
"Tokyo is the capital of Japan.",
"chocolate is made from cocoa."]
I want to return all sentences that have the exact full word "chocolate", ie ["Chocolate is loved by all.", "chocolate is made from cocoa."]
. If any sentence does not have the word "chocolate", it shouldn't be returned. The word "chocolateyyy" should not be returned either.
How can I do this in Python?
This will make sure that the search
word is actually a full word, rather than a sub-word like 'chocolateyyy'. It's also not case sensitive, so 'Chocolate' = 'chocolate' despite the first letters being capitalised differently.
sent = ["Chocolate is loved by all.", "Brazil is the biggest exporter of coffee.",
"Tokyo is the capital of Japan.","chocolate is made from cocoa.", "Chocolateyyy"]
search = "chocolate"
print([i for i in sent if search in i.lower().split()])
Here's a more expanded version for clarity with an explanation:
result = []
for i in sent: # Go through each string in sent
lower = i.lower() # Make the string all lowercase
split = lower.split(' ') # split the string on ' ', or spaces
# The default split() splits on whitespace anyway though
if search in split: # if chocolate is an entire element in the split array
result.append(i) # add it to results
print(result)
I hope this helps :)
You need:
filtered_sent = [i for i in sent if 'chocolate' in i.lower()]
Output
['Chocolate is loved by all.', 'chocolate is made from cocoa.']
From this question , you want some of the methods in the re library . In particular:
\\b Matches the empty string, but only at the beginning or end of a word.
You can therefore search for "chocolate" using re.search(r'\\bchocolate\\b', your_sentence, re.IGNORECASE)
.
The rest of the solution is just to iterate through your list of sentences and return a sublist that matches your target string.
You can use the regular expression library in python:
import re
sent = ["Chocolate is loved by all.",
"Brazil is the biggest exporter of coffee.",
"Tokyo is the capital of Japan.",
"chocolate is made from cocoa."]
match_string = "chocolate"
matched_sent = [s for s in sent if len(re.findall(r"\bchocolate\b", s, re.IGNORECASE)) > 0]
print (matched_sent)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.