简体   繁体   中英

How do I remove a substring from text based on a partial match in Python?

I have a long block of text that contains a subtext that I want to remove based on a partial match (90%).

string = "Adam is a boy who lives in Michigan.  
        He loves to eat apples and oranges. 
        He also enjoys playing with his dog and cat. 
        Adam is a happy boy."

substring = "He loves to apple oranges"

And I want to return

"Adam is a boy who lives in Michigan.  
 He also enjoys playing with his dog and cat. 
 Adam is a happy boy."

The words "eat" and "and" don't appear in the substring, but I want to remove the whole sentence "He loves to eat apples and oranges." I'm not really sure how to do this. Thanks!

You can use difflib.SequenceMatcher :

from difflib import SequenceMatcher
'\n'.join(s for s in string.splitlines() if SequenceMatcher(' '.__eq__, s, substring).ratio() < 0.6)

This returns:

Adam is a boy who lives in Michigan.
He also enjoys playing with his dog and cat.
Adam is a happy boy.

Demo: https://ideone.com/twDu1r

string = string.replace(substring,'')

这会将字符串中的子字符串替换为空( ""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM