简体   繁体   中英

get sentence from list of sentences with exact word match : Python

Let's say I have a list of sentences:

sent = ["Chocolate is loved by all.", 
        "Brazil is the biggest exporter of coffee.", 
        "Tokyo is the capital of Japan.",
        "chocolate is made from cocoa."]

I want to return all sentences that have the exact full word "chocolate", ie ["Chocolate is loved by all.", "chocolate is made from cocoa."] . If any sentence does not have the word "chocolate", it shouldn't be returned. The word "chocolateyyy" should not be returned either.

How can I do this in Python?

This will make sure that the search word is actually a full word, rather than a sub-word like 'chocolateyyy'. It's also not case sensitive, so 'Chocolate' = 'chocolate' despite the first letters being capitalised differently.

sent = ["Chocolate is loved by all.", "Brazil is the biggest exporter of coffee.",
        "Tokyo is the capital of Japan.","chocolate is made from cocoa.", "Chocolateyyy"]

search = "chocolate"

print([i for i in sent if search in i.lower().split()])

Here's a more expanded version for clarity with an explanation:

result = []
for i in sent: # Go through each string in sent
    lower = i.lower() # Make the string all lowercase
    split = lower.split(' ') # split the string on ' ', or spaces
                     # The default split() splits on whitespace anyway though
    if search in split: # if chocolate is an entire element in the split array
        result.append(i) # add it to results
print(result)

I hope this helps :)

You need:

filtered_sent = [i for i in sent if 'chocolate' in i.lower()]

Output

['Chocolate is loved by all.', 'chocolate is made from cocoa.']

From this question , you want some of the methods in the re library . In particular:

\\b Matches the empty string, but only at the beginning or end of a word.

You can therefore search for "chocolate" using re.search(r'\\bchocolate\\b', your_sentence, re.IGNORECASE) .

The rest of the solution is just to iterate through your list of sentences and return a sublist that matches your target string.

You can use the regular expression library in python:

import re

sent = ["Chocolate is loved by all.", 
        "Brazil is the biggest exporter of coffee.", 
        "Tokyo is the capital of Japan.",
        "chocolate is made from cocoa."]
match_string = "chocolate"
matched_sent = [s for s in sent if len(re.findall(r"\bchocolate\b", s, re.IGNORECASE)) > 0]
print (matched_sent)    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM