简体   繁体   中英

How to remove all words matching a pattern, except certain words which I want to preserve?(they match the pattern)

So I have a pattern I want to strip from a corpus of words, however there are certain words that match the pattern which I want to keep. I have a list of such words, and can remove all words matching the pattern.

But, how do I keep the words in the list, and remove any others matching the pattern?

Thank you.

You can use set intersection

import re
s = 'Philip Hammond under pressure after claiming that public sector workers are overpaid'
s1 = re.sub("[^\w]", " ",  s).split()

Then you go for

d1 = ['Philip', 'Hammond']

print (set(s1).intersection(d1))

Finally

{'Philip', 'Hammond'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM