I am trying to search through an 8-page PDF file for all words within parentheses EXCEPT for "(EAI), (EY)" and a few others. I am using a regex and can get all say three letter words within parentheses to pull, but I don't know how to exclude what I want to exclude.
import re
lines = text.split()
search = "\(\D{3}\)"
regex = re.compile(search)
for line in lines:
three_letters= regex.findall(line)
for word in three_letters:
print(word)
I get the following list:
(FBS) (NFS) (IAD) (CDs) (CDs) (EAI) (EAI) (EAI) (VIG) (EAI) (EAI) (NTF) (DRP) (EAI) (IAD)
But I need a handful of them excluded.
I've been banging my head on this one for a while please help!!
Use the findall function with this (matches 3 letters)
\\((?!(?:list|of|stuff|you|don't|want)\\))[AZ]{3}\\)
Formatted
\(
(?!
(?:
list
| of
| stuff
| you
| don't
| want
)
\)
)
[A-Z]{3}
\)
Specify a range to make it variable.
This example matches 2 to 5 letters {2,5}
.
Or, 2 to no upper limit is just {2,}
\\((?!(?:list|of|stuff|you|don't|want)\\))[AZ]{2,5}\\)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.